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Abstract 



We survey recent (and not so recent) results concerning arrangements of lines, 
points and other geometric objects and the applications these results have in 
theoretical computer science and combinatorics. The three main types of problems 
we will discuss are: 

1. Counting incidences: Given a set (or several sets) of geometric objects 
(lines, points, etc.), what is the maximum number of incidences (or intersec- 
tions) that can exist between elements in different sets? We will see several 
results of this type, such as the Szemeredi- Trotter theorem, over the reals 
and over finite fields and discuss their applications in combinatorics (e.g., 
in the recent solution of Guth and Katz to Erdos' distance problem) and in 
computer science (in explicit constructions of multi-source extractors). 

2. Kakeya type problems: These problems deal with arrangements of lines 
that point in different directions. The goal is to try and understand to what 
extent these lines can overlap one another. We will discuss these questions 
both over the reals and over finite fields and see how they come up in the 
theory of randomness-extractors. 

3. Sylvester-Gallai type problems: In this type of problems, one is pre- 
sented with a configuration of points that contain many 'local' dependencies 
(e.g., three points on a line) and is asked to derive a bound on the dimen- 
sion of the span of all points. We will discuss several recent results of this 
type, over various fields, and see their connection to the theory of locally 
correctable error-correcting codes. 

Throughout the different parts of the survey, two types of techniques will 
make frequent appearance. One is the polynomial method, which uses polynomial 
interpolation to impose an algebraic structure on the problem at hand. The other 
recurrent techniques will come from the area of additive combinatorics. 



1 



Contents 



1 Overview 3 

2 Counting Incidences Over the Reals 9 

2.1 The Szemeredi- Trotter theorem 9 

2.2 Applications of Szemeredi- Trotter over M 14 

2.3 The Elekes-Sharir framework 17 

2.4 The Polynomial Method and the Joints Conjecture 21 

2.5 The Polynomial Ham-Sandwich theorem 23 

2.6 The Guth-Katz incidence theorem for lines in M 3 28 

2.7 Application of the Guth-Katz bound to sum-product estimates 36 

3 Counting Incidences Over Finite Fields 38 

3.1 Ruzsa Calculus 38 

3.2 Growth in F p 41 

3.3 The Balog-Szemeredi-Gowers theorem 44 

3.4 Szemeredi- Trotter in finite fields 48 

3.5 Multi-source extractors 53 

4 Kakeya sets 60 

4.1 Kakeya sets in R n 60 

4.2 Kakeya sets in finite fields 67 

4.3 Randomness Mergers from Kakeya sets 70 

5 Sylvester- Gallai type problems 74 

5.1 Sylvester-Gallai type theorems over the reals 74 

5.2 Rank lower bound for design matrices 78 

5.3 Sylvester-Gallai over finite fields 85 

5.4 Locally Correctable Codes 90 

References 99 



2 



Chapter 1 

Overview 



Consider a finite set of points, P, in some vector space and another set L of lines. An 
incidence is a pair (p,£) € P X L such that p £ £. There are many types of questions one 
can ask about the set of incidences and many different conditions one can impose on the 
corresponding set of points and lines. For example, the Szemeredi- Trotter theorem (which 
will be discussed at length below) gives an upper bound on the number of possible incidences. 
More generally, in this survey we will be interested in a variety of problems and theorems 
relating to arrangements of lines and points and the surprising applications these theorems 
have, in theoretical computer science and in combinatorics. The term 'incidence theorems' is 
used in a very broad sense and might include results that could fall under other categories. 
We will study questions about incidences between lines and points, lines and lines (where an 
incidence is a pair of intersecting lines), circles and points and more. 

Some of the results we will cover have direct and powerful applications to problems in 
theoretical computer sciences and combinatorics. One example in combinatorics is the recent 
solution of Erdos' distance problem by Guth and Katz [GKlOb]. The problem is to lower 
bound the number of distinct distances defined by a set of points in the real plane and the 
solution (which is optimal up to logarithmic factors) uses a clever reduction to a problem on 
counting incidences of lines [ES10]. 

In theoretical computer science, incidence theorems (mainly over finite fields) have been 
used in recent years to construct extractors, which are procedures that can transforms weak 
sources of randomness (that is, distributions that have some amount of randomness but 
are not completely uniform) into completely uniform random bits. Extractors have many 
theoretical applications, ranging from cryptography to data structures to metric embedding 
(to name just a few) and the current state-of-the-art constructions all use incidence theorems 
in one way or another. The need to understand incidences comes from trying to analyze 
simple looking constructions that use basic algebraic operations. For example, how 'random' 
is X ■ Y + Z, when X, Y, Z are three independent random variables each distributed uniformly 
over a large subset of ¥ p . 

We will see incidence problems over finite fields, over the reals, in low dimension and in 



3 



high dimension. These changes in field/dimension are pretty drastic and, as a consequence, 
the ideas appearing in the proofs will be quite diverse. However, two main techniques will 
make frequent appearance. One is the 'polynomial method' which uses polynomial interpola- 
tion to try and 'force' an algebraic structure on the problem. The other recurrent techniques 
will come from Additive Combinatorics. These are general tools to argue about sets in Abelian 
groups and the way they behave under certain group operations. These two techniques are 
surprisingly flexible and can be applied in many different scenarios and over different fields. 

The survey is be divided into four chapters, following this overview chapter. The first 
chapter will be devoted to problems of counting incidences over the real numbers (Szemeredi- 
Trotter and others) and will contain applications mostly from combinatorics (including the 
Guth-Katz solution to Erdos' distance problem). The second chapter will be devoted to the 
Szemeredi- Trotter theorem over finite fields and its applications to the explicit constructions 
of multi-source extractors. The third chapter will be devoted to Kakeya type problems which 
deal with arrangements of lines pointing in different directions (over finite and infinite fields). 
The applications in this chapter will be to the construction of another variant of extractors 
- seeded extractors. The fourth and final chapter will deal with arrangements of points with 
many collinear triples. These are related to questions in theoretical computer science having 
to do with locally correctable error correcting codes. More details and definitions relating to 
each of the aforementioned chapters are given in the next four subsections of this overview 
which serves a road map to the various sections. 

This survey is aimed at both mathematicians and computer scientists and could serve as 
a basis for a one semester course. Ideally, each chapter should be read from start to finish 
(the different chapters are mostly independent of each other). We only assume familiarity 
with undergraduate level algebra, including the basics of finite fields and polynomials. 

Notations: We will use <, > and ~ to denote (in)equality up to multiplicative absolute 
constants. That is, X < Y means 'there exists an absolute constant C such that X < CY\ 
In some places, we opt to use instead the computer science notations of 0(-),f2(-) and #(•) 
to make some expressions more readable. So X = 0(Y) is the same as X < Y, X = Q(Y) 
is the same as X > Y and X = 6(Y) is the same as X ~ Y. This allows us to write, for 
example, X = 2 n ^ to mean: 'there exists an absolute constant C such that x > 2 . 

Sources: Aside from research papers there were two main sources that were used in the 
preparation of this survey. The first is a sequence of posts on Terry Tao's blog which cover 
a large portion of Chapter 2 (see e.g. [Tao09]). Ben Green's lecture notes on additive 
combinatorics [Gre09] were the main source in preparing the Chapter 3. Both of these 
sources were indispensable in preparing this survey and I am grateful to both authors. 
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Chapter 2: Counting incidences over the reals 

Let P be a finite set of points and L a finite set of lines in R 2 . Let 

I(P,L) = {(p,l) G P x L\pe£} 

denote the set of incidences between P and L. A basic question we will ask is how big can 
I(P, L) be. The Szemeredi- Trotter (ST) theorem [ST83] gives the (tight) upper bound of 

\I(P,L)\ <(|L|.|P|) 2 / 3 + |L| + |P|. 

We begin this chapter in Section 2.1 with two different proofs of this theorem. The first 
proof, presented in Section 2.1.1, is due to Tao [Tao09] (based on [CEG + 90b] and similar to 
the original proof of [ST83]) and uses the method of cell partitions. The idea is to partition 
the two dimensional plane into cells, each containing a bounded number of points/lines and to 
argue about each cell separately. This uses the special 'ordered' structure of the real numbers 
(this proof strategy is also the only one that generalizes to the complex numbers [Tot03]). 
The second proof, presented in Section 2.1.2, is due to Szekely [Sze97] ands uses the crossing 
number inequality for planar drawings of graphs and is perhaps the most elegant proof known 
for this theorem. This proof can also be adapted easily to handle intersections of more 
complex objects such as curves. We continue in Section 2.2 with some simple applications 
of the ST theorem to geometric and algebraic problems, including to proving sum product 
estimates and to count distances between sets of points. 

Sections 2.3 to 2.6 are devoted to the proof of the Guth-Katz theorem on Erods' distance 
counting problem. This theorem, obtained in [GKlOb], says that a set of iV points in the real 
plane define at least > Nf log N distinct distances. This gives an almost complete answer 
to an old question of Erdos (the upper bound has a factor of \/log N instead of log N) . The 
tools used in the proof are developed over several sections which contain several other related 
results. 

In Section 2.3 we discuss the Elekes-Sharir framework [ES10] which reduces distance 
counting to a question about incidences of a specific family of lines in R 3 , much in the 
spirit of the ST theorem. Sections 2.4 and 2.5 introduce the two main techniques used in 
the proof of the Guth-Katz theorem. In Section 2.4 we introduce for the first time one 
of the main characters of this survey - the polynomial method. As a first example to the 
power of this method, we show how it can be used to give a solution to another beautiful 
geometric conjecture - the Joints conjecture [GKlOa]. Here, one has a set of lines in R 3 
and wants to upper bound the number of joints, or non-coplanar intersections of three lines 
or more. In Section 2.5 we introduce the second ingredient in the Guth-Katz theorem - 
the polynomial Ham-Sandwich theorem. This technique, introduces by Guth in [Gut08], 
combines the polynomial method with the method of cell partitions. As an example of how 
this theorem is used we give a third proof of the ST theorem which was discovered recently 
[KMS11]. 

Section 2.6, contains a relatively detailed sketch of the proof of the Guth-Katz theorem 
(omitting some of the more technical algebraic parts). The main result proved in this section 
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is an incidence theorem upper bounding the number of pairwise intersections in a set of N 
lines in M 3 . If we don't assume anything, N lines can have > N 2 intersections (an intersection 
is a pair of lines that intersect). An example is a set of N/2 horizontal lines and N/2 vertical 
lines, all lying in the same plane. If we assume, however, that the lines are 'truly' in 3 
dimensions, in the sense that no large subset of them lies in two dimensions, we can get a 
better (and tight) bound of < iV^logiV. This theorem then implies the bound on distinct 
distances using the Elekes-Sharir framework. 

In the last Section of this chapter, Section 2.7, we see yet another beautiful application 
of the three dimensional incidence theorem of Guth and Katz obtaining optimal bounds in 
the flavor of the sum product theorem [IRNR11]. 

Chapter 3: Counting incidences over finite fields 

This chapter deals with the analog of the Szemeredi- Trotter theorem over finite fields and its 
applications. When we replace the field M with a finite field ¥ q of q elements things become 
much more tricky and much less is known (in particular there are no tight bounds). Assuming 
nothing on the field, the best possible upper bound on the number of intersections between 
N lines and N points is ~ N , which is what one gets from only using the fact that two 
points determine a line (using a simple Cauchy-Schwartz calculation). However, if we assume 
that F q does not contain large sub-fields (as is the case, for example, if q is prime) one can 
obtain a small improvement of the form N 1 ' 5 ~ e for some positive e, provided N « p 2 . This 
was shown by Bourgain, Katz and Tao as an application of the sum product theorem over 
finite fields [BKT04]. The sum product theorem says that, under the same conditions on 
subfields, for every set A C ¥ q of size at most q 1 ~ a we have max{|^4 + A\, \A ■ A\} > |74| 1+a , 
where a' depends only on a. The sets A + A is defined as the set of all elements of the form 
a + a' with a, a' £ A (A ■ A is defined in a similar way). 

The proof of the finite field ST theorem is given in Sections 3.1 - 3.4. Section 3.1 describes 
the machinery called 'Ruzsa calculus' - a set of useful claims for working with sumsets. 
Section 3.2 we prove a theorem about growth of subsets of ¥ p (we will only deal with prime 
fields) which is a main ingredient of the proof of the ST theorem. Section 3.3 proves the 
Balog-Szemeredi-Gowers theorem, a crucial tool in this proof and in many other results in 
additive combinatorics, finally, Section 3.4 puts it all together and proves the final result. 
We note that, unlike previous expositions (and the original [BKT04]), we opt to first prove 
the ST theorem and then derive the sum product theorem from it as an application. This is 
not a crucial matter but it seems to simplify the proof of the ST theorem a bit. 

As an application of these results over finite fields we will discuss, in Section 3.5, the 
theory of multi-source extractors coming from theoretical computer science. We will see 
how to translate the finite field ST theorem into explicit mappings which transform 'weak' 
structured sources of randomness into purely random bits. More precisely, suppose you are 
given samples from several (at least two) independent random variables and want to use 
them to output uniform random bits. It is not hard to show that a random function will do 
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the job, but finding explicit (that is, efficiently computable) constructions is a difficult task. 
Such constructions have applications in theoretical computer science, in particular in the area 
of de-randomization, which studies the power of randomized computation vs. deterministic 
computation. 

We will discuss in some detail two representative results in this area: the extractors 
of Barak Impagliazzo and Wigderson for several independent blocks [BIW06], which were 
the first to introduce the tools of additive combinatorics to this area, and Bourgain's two 
source extractor [Bou05]. Both rely crucially on the finite field Szemeredi- Trotter theorem of 
[BKT04]. 

Chapter 4: Packing lines in different directions — Kakeya sets 

This chapter deals with a somewhat different type of theorems that describe the way lines in 
different directions can overlap. In Sections 4.1 and 4.2 we will discuss these questions over 
the real numbers and over finite fields, respectively. In Section 4.3 we will discuss applications 
of the finite field results to problems in theoretical computer science. 

A Kakeya set K C W 1 is a compact set containing a unit line segment in every direction. 
These sets can have measure zero. An important open problem is to understand the minimum 
Minkowski or Hausdorff dimension 1 of a Kakeya set. This question reduces in a natural way 
to a discrete incidence question involving a finite set of lines in many 'sufficiently separated' 
directions. The Kakeya conjecture states that Kakeya sets must have maximal dimension 
(i.e., have dimension n). The conjecture is open in dimensions n > 3 and was shown to have 
deep connections with other problems in Analysis, Number Theory, PDE's and others (see 
[TaoOl]). 

The most successful line of attack on this conjecture was initiated by Bourgain [Bou99] 
and later developed by Katz and Tao [KT02] and uses tools from Additive Combinatorics. 
In Section 4.1 we will discuss Kakeya sets over the reals and prove a > (4/7)n bound on the 
Minkowski dimension, which is very close to the best known lower bound of (0.596...)n. The 
underlying additive combinatorics problem that arises in this context is upper bounding the 
number of differences a — b, for pairs (a, 6) G G C A x B in some graph G as a function of 
the number of sums (or, more generally, weighted sums) on the same graph. We will not 
discuss the applications of the Euclidean Kakeya conjecture since they are out of scope for 
this survey (we are focusing of applications in discrete mathematics and computer science). 
Even though we will not directly use additive combinatorics results developed in Chapter 3, 
they will be in the background and will provide intuition as to what is going on. 

Over a finite field ¥ q a Kakeya set is a set containing a line in every direction (a line will 
contain q points). It was conjectured by Wolff [W6199] that the minimum size of a Kakeya 
set is at least C n ■ q n for some constant C n depending only on n. We will see the proof 
of this conjecture (obtained by the author in [Dvi08]) which uses the polynomial method. 
An application of this result, described in Section 4.3 is a construction of seeded extractors, 

1 For a definition see Section 4.1. 
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which are explicit mapping that transform a 'weak' random source into a close-to-uniform 
distribution with the aid of a short random 'seed' (since there is a single source, the extractor 
must use a seed). A specific question that arises in this setting is the following: Suppose Alice 
and Bob each pick a point X, Y £ F™ (X for Alice, Y for Bob). Consider the random variable 
Z computed by picking a random point on the line through X, Y. If both Alice and Bob 
pick their points independently at random then it is easy to see that Z will also be random. 
But what happens when Bob picks his points Y to be some function Y = F{X)1 Using the 
connection to the Kakeya conjecture one can show that, in this case, Z is still sufficiently 
random in the sense that it cannot hit any small set with high probability. More formally, 
this requires proving a variant of the Kakeya conjecture over finite field with lines replaces 
by low degree curves. 

Chapter 5: From local to global - Sylvester-Gallai type theorems 

The Sylvester-Gallai (SG) theorem says that, in a finite set of points in IR n , not all on the 
same line, there exists a line intersecting exactly two of the points. In other words, if for 
every two points u, v in the set, the line through u, v contains a third point in the set, then 
all points are on the same line. Besides being a natural incidence theorem, one can also look 
at this theorem as converting local geometric information (collinear triples) into global upper 
bounds on the dimension (i.e., putting all points on a single line, which is one dimensional). 
We will see several generalizations of this theorem, obtained in [BDYW11], in various settings. 
For example, assume that for every point u in a set of N points there are at least A 7 / 100 
other points v such that the line through u, v contains a third point. We will see in this case 
that the points all lie on an affine subspace of dimension bounded by a constant. The proof 
technique here is different than what we have seen so far and will rely on convex optimization 
techniques among other things. These results will be described in Section 5.1 with the main 
technical tool, a rank lower bound for design matrices, proved in Section 5.2. 

In Section 5.3 we will consider this type of questions over a finite field and see how the 
bounds are weaker in this case. In particular, under the same assumption as above (with 
AT/100) the best possible upper bound on the dimension will be < \og q (N)), where q is 
the characteristic of the field [BDSS11]. Here, we will again rely on tools from additive 
combinatorics and will use results proved in Chapter 3. 

In Section 5.4 we will see how this type of questions arise naturally in computer science 
applications involving error correcting codes which are 'locally correctable'. A (linear) Locally- 
Correctable- Code (LCC) is a (linear) error correcting code in which each symbol of a possible 
corrupted codeword can be corrected by looking at only a few other locations (in the same 
corrupted codeword). Such codes are very different than 'regular' error correcting codes (in 
which decoding is usually done in one shot for all symbols) and have interesting applications 
in complexity theory 2 . 



2 They are also very much related to Locally Decodable Codes (LDCs) which are discussed at length in the 
survey [Yekll]. 
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Chapter 2 

Counting Incidences Over the Reals 



2.1 The Szemeredi- Trotter theorem 



Let L be a finite set of points in R 2 and let P be a finite set of points in R 2 . We define 



I(P,L) = {(p,£) ePxL\ P e£} 



to be the set of incidences between P and L. We will prove the following result of Szemeredi 
and Trotter [ST83]. 

Theorem 2.1.1 (ST theorem). Under the above notations we have 



We will use <,> and ~ to denote (in)equality up to multiplicative absolute constants. 

The following example shows that this bound is tight. Let L be the set of N ~ M 3 lines 
of the form {(x,y) G R 2 \ y = ax + b, } with a G [M),b G [M 2 ]. Let P = {(x,y) G R 2 \ x £ 
[M],y G [2M 2 ]} be a set of TV ~ M 3 points. Observe that each line £ G L intersects P in > M 
points (for each x G [M], y = ax + b < 2M 2 ). This gives a total of M 4 ~ A^ 4 / 3 incidences. 

As a step towards proving the ST theorem we prove the following claim which gives an 
'easy' bound on the number of incidences. It is 'easy' not just because it has a simple proof 
but also because it only uses the fact that every two points define a single line and every pair 
of lines can intersect in at most one point (these facts hold over any field) . The proof of the 
claim will use the Cauchy- Schwartz inequality which says that 



\I(P,L)\ <(|P|.|L|) 2 / 3 + |L| + |P 




whenever a^bi are positive real numbers. 
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Claim 2.1.2. Let P,L be as above. Then we have the following two bounds: 

I(P,L)<\P\-\L\ 1 I 2 + \L\ 

and 

I{P,L)<\L\-\P\ 1 / 2 + \P\. 

Proof. We will only prove the first assertion (the second one follows using a similar argument 
or by duality). The only geometric property used is that through every two points passes 
only one line. First, observe that 

\I(P,L)\<\P\ 2 + \L\. (2.1) 

To see this, count first the lines that have at most one point in P on them. These lines 
contribute at most \L\ incidences. The rest of the lines have at least two points in P on each 
line. The total number of incidences on these lines is at most \P\ 2 since otherwise there would 
be a point p G P that lies on > |P| lines and each of these lines must have one additional 
point on it and so there are more than |P| points - a contradiction. 

We now bound the number of incidences. We use l pg ^ to denote the indicator function 
which is equal to 1 if p G I and equal to zero otherwise. 



i/(p,l)| 2 = EEw ( 2 - 2 ) 

yeLpeP J 

( V 

< \L\ ■ ^P e£ I (Cauchy Schwartz) (2.3) 

= \ L \- E Ew-w ( 2 - 4 ) 

pi,P2&P t&L 

< \L\.(\I(P,L)\ + \P\ 2 ) (2.5) 

< \L\ 2 + 2\L\ ■ \P\ 2 , (2.6) 

which implies the bound. □ 



2.1.1 Proof using cell partitions 

The first proof of the ST theorem we will see uses the idea of cell partitions and is perhaps to 
most direct of the three proofs we will encounter. The proof we will see is due to Tao [Tao09] 
(based loosely on [CEG + 90b]) and is similar in spirit to the original proof of Szemeredi and 
Trotter. The idea is to use the properties of the real plane to partition it into small regions 
such that each region will intersect a small fraction of the lines in our set L. This allows to 
'amplify' the easy bound (Claim 2.1.2) to a stronger (indeed, optimal) bound by applying it 
on separated smaller instances of the problem. 
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Lemma 2.1.3. For every r > 1 there exists a set of 0{r) lines plus some additional line 
segments not containing any point in P that partition M? into at most 0(r 2 ) regions (convex 
open sets) such that the interior of each region is incident to at most 0(\L\/r) lines in L. 

We will sketch the proof of this lemma later. Before that, lets see how it implies the ST 
theorem: First we can assume w.l.o.g that 

|£|i/2 <K |p| K< | L |2 

(we use A « B to mean that A <= c-B for some sufficiently small constant c). If not, then 
the bound in the ST theorem follows from Claim 2.1.2. We will apply Lemma 2.1.3 with some 
r to be chosen later. Let R be the set of lines defining the partition (recall that there are 
some additional line segments not counted in R that do not contain points in P). For each 
cell C we apply Claim 2.1.2 to bound the number of incidences in this cell (the cell does not 
include the boundary). We get that a cell C can have at most 0(\P D C\ ■ (l-Ll/r) 1 / 2 + \L\/r) 
incidences. Summing over all cells we get that 

\I(P, L)\ < \I(P,Ln R)\ + 0(\P\\L\ l / 2 /r 1 / 2 + \L\r) + 0(\L\r) 

where the first term counts the incidences of point with lines in RdL, the second term counts 
incidences in the open cells and the third term counts the incidences of lines not in R with 
points in the cell boundary (each line not in R has at most r incidences with points on R). 
Setting 

r ~ |P| 2 / 3 /|£| 1/3 

we get that 

\I(P,L)\ < \I{P,LnR)\ + \P\ 2/3 \L\ 2 / 3 . 

Since \P\ << \L\ 2 we get that r < |£|/10 and so, we can repeat the same argument on 
P,LnR obtaining a geometric sum that only adds up to a constant. This completes the 
proof of the ST theorem. 

Proof of the cell partition lemma 

We only sketch the proof. The proof will be probabilistic. We will pick a random set of the 
lines in L to be the set R (plus some additional segments) and will argue that it satisfies 
the lemma with positive probability (this will imply that a good choice exists). This type 
of arguments are common in combinatorics and are usually referred to as the 'probabilistic 
method'. We will make two simplifying assumptions: one is that at most two lines pass 
through a point (this can be removed by a limiting argument). The second is that there are 
no vertical lines in L and that no point in P is on a vertical line through the intersection of 
two lines in L (this can be removed by a random rotation). 

The particular procedure we will use to pick the partition is the following: first we take 
each line i £ L to be in R with probability r/\L\. This will give us 0(r) lines with high 
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probability (say, at least 0.99). This set of lines can create at most 0(r 2 ) cells. Then, we 'fix' 
the partition so that each cell has a bounded (at most 4) number of line segments bordering 
it. This 'fix' is done by adding vertical line segments through every point that is adjacent 
to a cell with more than 4 border segments (the number 4 is not important, it can be any 
constant). These extra line segments are not in L and, by our 'random rotation' assumption, 
do not hit any point in P. One can verify that adding these segments does not increase the 
number of cells above 0(r 2 ) (there are at most 0(r 2 ) initial 'corners' to fix). 

Having described the probabilistic construction we turn to analyze the probability of a 
cell having too many lines passing through it. Consider a cell with 4 border segments. Each 
line passing through the cell must intersect at least one of these bordering segments. If there 
are more than M lines in the cell than one segment must have at least M/2 lines in L passing 
through it. Since all of these lines were not chosen in the partition we get that this event (for 
this specific segment) happens with probability at most 

(l-r/\L\r^. 

Taking M to be roughly ~ 100|L| log \L\/r we get that this probability is at most |L| -100 . 
Therefore, we can bound the union of all 'bad' events of this form (i.e, of a particular segment 
or a line in L containing a series of M lines not chosen in R) as the product of the number 
of events times |L|~ 100 . Since the number of bad events is much smaller than |L| 100 we get 
that there exists a partition with a bound of 0(|£| log \ L\/r). A more careful argument can 
get rid of the logarithmic factor by arguing that (a) the number of 'bad' cells is very small 
and (b) we can use induction on this smaller set to get the required partition. 

This proof seems messy but is actually much cleaner than the original partition proof of 
Szemeredi and Trotter (which was deterministic). Next, we will see a much simpler proof of 
ST that does not use cell partition (later on we will see a third proof that uses a very different 
kind of cell partition using polynomials). 

2.1.2 Proof using the crossing number inequality 

Next, we will see a different, very elegant, proof of the ST theorem due to Szekely [Sze97] based 
on the powerful crossing number inequality [ACNS82, Lei81]. We will consider undirected 
graphs G = (V, E) on a finite set V of vertices and with a set Ed V x V of edges. A drawing 
of a graph is a placing of the vertices in the real plane M 2 with simple curves connecting 
two vertices if there is an edge between them (we omit the 'formal' definition since this is a 
very intuitive notion). For a drawing D of G we denote by cr(D) the number of 'crossings' 
or intersections of edges in the drawing. The crossing number of G, denoted cr(G) is the 
minimum over all drawings D of G of cr(D). Thus, a graph is planar if it has a crossing 
number of zero. 

A useful tool when talking about planar graphs is Euler's formula. Given a planar drawing 
D of a connected graph G = (V, E) we have the following inequality 

\V\ - \E\ + \F\ = 2, (2.7) 
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where F is the set of faces of the drawing (including the unbounded face). The proof is a very 
simple induction on \F\. If there is one face than the graph is a tree and so |V| = \E\ + 1. If 
there are more faces then we can remove a single edge and decrease the number of faces by 
one. 

The proof of the ST theorem will use a powerful inequality called the crossing number 
inequality. This inequality gives a strong lower bound on cr(G) given the number of edges 
in G. As a preliminary step we shall prove a weaker bound (which we will amplify later). 

Claim 2.1.4. Let G be a graph. Then cr(G) > \E\ - 3\V\. 

Proof. W.l.o.g we can assume G is connected and with at least 3 vertices. It is easy to check 
that, if G is planar then 3\F\ < 2\E\ (draw two points on either side of an edge and count 
them once by going over all edges and another by going over all faces, using the fact that 
each face has at least 3 edges adjacent to it). Plugging this into Euler's formula we get that, 
for planar graphs 

2<|F|-(1/3)|£| 

or \E\ < 3\V\. If the claim was false, we could remove at most cr(G) < \E\ — 3\V\ edges and 
obtain a planar drawing of G. The new graph will have at least \E\ — (\E\ — 3\V\) = 3\V\ 
edges - a contradiction. □ 

This is clearly not a very good bound and some simple examples demonstrate. To get 
the final bound we will apply Claim 2.1.4 on a random vertex induced subgraph and do some 
expectation analysis. This is a beautiful example of the power of the probabilistic method. 

Theorem 2.1.5 (Crossing Number Inequality [Lei81, ACNS82]). Let G be a graph. If\E\ > 
4\V\ then 

^ w?- 

Proof. Let G' = (V, E') be a random vertex induced subgraph with each vertex of V chosen 
to be in V' independently with probability p £ [0, 1] to be chosen later. Taking the expectation 
of 

cr(G') > |^|-3|V'| 

we get that 

p 4 • cr(G) > p 2 ■ \E\ - 3p-\V\. 

The right hand side is equal to the expectation of the r.h.s of the original inequality by 
linearity of expectation. The left hand side requires some explanation: consider a single 
crossing in a drawing of D which has the smallest number of crossings. This crossing will 
remain after the random restriction with probability p 4 . Thus, the expected number of 
crossings will be p 4 cr(G). This is however only an upper bound on the expected cr(G') since 
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there could be new ways of obtaining even better drawing after we move to G' (but this 
inequality is the 'right' direction so we're fine). Setting p = 4\V\/\E\ (which is at most 1 by 
our assumption) gives the required bound. □ 

We now prove the ST theorem using this inequality (this proof is by Szekely [Szc97] ) : Let 
P, L be our finite sets of points and lines (as above). Put aside lines that have only 1 point on 
them (this can contribute \L\ incidences) so that every line has at least two points. Consider 
the drawing of the graph whose vertex set is P and two points share an edge if they are (1) 
on the same line and (2) there is no third point on the line segment connecting them. The 
number of edges on a line £ G L is P^ — 1, where Pg = P C\ £. The total number of edges is 

£1^1-1 >(1/2)|I(P,L)|. 

t 

The crossing number of this graph is at most |L| 2 since each crossing is obtained from the 
intersection of two lines in \L\. Plugging all this into the Crossing number inequality we get 
that either \I(P,L)\ < \P\ or that 

lTl 2 > \HPSl 
I I ~ |p|2 

Putting all of this together we get \I(P, L)\ < (|P||L|) 2 / 3 + \P\ + |L|. 

Notice that this proof can be made to work with simple curves instead of lines as long 
as two curves intersect in at most 0(1) points and two points sit on at most O(l) curves 
together. In particular, a set P of points and a set C of unit circles can have at most 
< (|C||P|) 2 / 3 + \C\ + \P\ incidences (we will use this fact later). 

2.2 Applications of Szemeredi- Trotter over K. 

We mentioned that the crossing number inequality proof of the Szemeredi- Trotter theorem 
works also for circles of unit distance. In general, the following is also true and very useful 
(the proof is left as an easy exercise using the crossing number inequality). 

Theorem 2.2.1. Suppose we have a family T of simple curves (i.e., that do not intersect 
themselves) and a set of points P in M 2 such that (1) every two points define at most C 
curves in V and (2) every two curves in V intersect in at most C points then 

|/(r,p)|<(|r|.|p|) 2 / 3 + |r| + |p|, 

where the hidden constant in the inequality depends only on C . 

If we only have a bound on the number of curves passing through k of the points (for 
some integer k > 2) the following was shown by Pach and Sharir (and is not known to be 
tight for k > 3): 
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Theorem 2.2.2 (Pach-Sharir [PS98]). Let T be a family of curves in the plane and let P be 
a set of points. Suppose that through every k points of P there are at most C curves and that 
every two curves intersect in at most C points. Then 

\i(t,p)\ < |p| fc /( 2fc ~ 1 ) . |r| 1 ~ 1 /( 2fc ~ 1 ) + |r| + \p\, 

where the hidden constant depends only on C . 



In particular, this bound can be used for families of algebraic curves of bounded degree. 

A simple but useful corollary of the ST theorem is the following. The proof is an easy 
calculation and is left to the reader. 



Corollary 2.2.3. The P and L be set of points and lines in R 2 . For k > 1 let denote the 
set of lines in L that contain at least k elements of P. Then, 

lrl . \P\ 2 \P\ 

2.2.1 Beck's theorem 

A nice theorem that follows from ST using purely combinatorial arguments is Beck's theorem: 

Theorem 2.2.4 (Beck's theorem [Bcc83]). Let P be a set of points in the plane and let L be 
the set of lines containing at least 2 points in P. Then one of these two cases must hold: 

1. There exists a line in L that contains > |P| points. 

2- \L\ > \P\ 2 

In other words, if a set of points P defines « \P\ 2 lines than there is a line containing a 
constant fraction of the points. 

Proof. Let n = \P\. We partition the lines in L into ~ logn sets Lj C L, the j'th set contains 
the lines with ~ 2 J points from P (more precisely, the lines that contain between 2 J and 2- ?+1 
points). Using Corollary 2.2.3 on each of the Lj's we get that 

, „ n 2 n 

\LA < -5-r H r. 

Take C to be some large constant to be chosen later and let M = ^c<2i<n/c^ J j be the set of 
lines with 'medium' multiplicity. Since each line in Lj contains ~ 2 2j pairs of points we get 

2 

that there are at most < ~ + 2 ] n pairs of points on lines in Lj. Summing over all fs with 
C < 2 J < n/C and taking C to be sufficiently large we get that the number of pairs of points 
on lines in M is at most ?i 2 /100. Thus, there are two cases: either there is a line with at 
least n/C points and we are done. Alternatively, there are > n 2 points on lines that contain 
at most C points each. In this case we get that \L\ > n/C 2 which is linear in n. □ 
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2.2.2 Erdos unit distance and distance counting problems 



Let A be a set of points in M 2 . We define 

A 1 (A) = {(p,q)€A 2 \\\ P -q\\ = l}, 

(i.e., all pairs of Euclidean distance 1). Erdos conjectured (and this is still open) that for all 
sets A we have |Ai(A)| < C(e) • |^4| 1+e for all e > 0. Again, the construction which obtains 
the maximal number of unit distances is a grid (this time, however, the step size must be 
chosen carefully using number theoretic properties). 

Using the unit-circle version of the ST theorem we can get a bound of |Ai(^4)| < |A| 4 / 3 , 
which is the best bound known. To see this, consider the \A\ circles of radius 1 centered 
at the points of \A\. Two circles can intersect in at most two points and every two points 
define at most two radius one circles that pass through them. Therefore, we can use the ST 
theorem to bound the number of incidences by |A| 4//3 . In four dimensions the number of unit 
distances in an arrangement can be as high as ~ \A\ 2 . In three dimensions the answer is not 
known. 

A related question of Erdos is to lower bound the number of distinct distances defined by 
the pairs in A. Let 

dist(A) = {\\p-q\\\p,qe A}. 
It was conjectured by Erdos that 



|dist(A)| 



> \A\ 



(log |4)V2 • 



Considering n points on a ^pn x y/n integer grid, gives an example sowing this bound is tight. 
Using the bound on unit distances above (which, by scaling, bounds the maximal number 
of distances equal to any real number) we immediately get a lower bound of > |A| 2 / 3 on 
the number of distinct distances. A result which comes incredibly close to proving Erdos's 
conjecture (with log instead of log 1 / 2 ) is a recent breakthrough of Guth and Katz which uses 
a three dimensional incidence theorem in the spirit of the ST theorem (we will see this proof 
later on). 

2.2.3 Sum Product theorem over R 

Let A C R be a finite set and define 

A + A = {a + a' | a, a' <E A} 

and 

A ■ A = {a ■ a' \ a, a' G A} 

to be the sumset and product set of A. If A is an arithmetic progression than we have 
\A + A\ < \A\ and if A is a geometric progression we have \A ■ A\ < \A\. But can A be both? 
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In other words, can we have an 'approximate' sub-ring A of the real numbers (one can also 
ask this for the integers). Using the ST theorem, Elekes [Ele97] proved the following theorem. 
The same theorem with the constant 5/4 replaced by some other constant larger than 1 was 
proved earlier by Erdos and Szemeredi [ES83]. 

Theorem 2.2.5 (Sum-Product Theorem over K). Let A C R be a finite set. Then 

max{\A + A\,\A-A\} > \A\ 5 / 4 . 

Proof. Let P = (A ■ A) x (A + A) and set L contain all lines defined by an equation of the 
form y = ax + b with a 6 A" 1 and b 6 A (A -1 is the set of inverses of elements of A (we 
can discard the zero). Then \L\ = \A\ 2 . Each line in L has >\A\ incidences with P (take all 
(x, y) on the line with x = a -1 • x' for some x' £ A) and so we have 

\ A \ 3 < \A- A\ 2 / 3 -\A + A\ 2 / 3 -\A\ i / 3 . 

If both \A ■ A\ and \A + A\ were << |A| 5 / 4 we would get that the r.h.s is smaller than the 
l.h.s - a contradiction. □ 

A more intricate application of the ST theorem can give an improved bound of max{|^4 + 
A\, \A ■ A\] > \A\ U / U [Sol05]. The conjectured bound is ~ \A\ 2 ~ e for all e > 0. 

2.2.4 Number of points on a convex curve 

Let 7 be a strictly convex curve in the plane contained in the range [n] x [n]. How many 
integer lattice points can 7 have? Using the curve version of the ST theorem we can bound 
this by < n 2 ! 3 (this proof is due to Iosevitch [Ios04]). This bound is tight and the example 
which matches it is the convex hull of integer points contained in a ball of radius n [BL98]. 
The argument proceeds as follows: take the family T to include all curves obtained from 7 by 
translating it by all integer points in [re] x [n]. One has |T| = n 2 . We take P to be all integer 
points that are on some curve from T so that \P\ < n 2 . The condition on the number of 
curves through every two points and the maximum intersection of two curves can be readily 
verified (here we must used the strict convexity). Thus the number of incidences is at most 
ra 8 / 3 . Since the curves are all translations of each other they all contain the same number of 
integer points. Therefore, each one will contain at most n 2 / 3 points. 

2.3 The Elekes-Sharir framework 

In a recent breakthrough Guth and Katz [GKlOb] proved that any finite set of points P in 
the real plane defines at least > |P|/log \P\ distinct distances. This is tight up to a ydog|P| 
factor and was conjectured by Erdos [Erd46]. The proof combines ideas that were developed 
in previous works and in particular a general framework developed by Elekes and Sharir in 
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[ES10] that gives a 'generic' way to approach such problems by reducing them to an incidence 
problem. (The original paper of Elekes and Sharir reduced the distance counting problem 
to an incidence problem between higher degree curves but this was simplified by Guth and 
Katz to give lines instead of curves.) 

To begin, observe that a 4-tuple of points a,b,c,d £ P satisfies \\a — b\\ = ||c — d\\ iff there 
exists a rigid motion T : IR 2 i— > M 2 (rotation + translation) such that T(a) = c and T(b) = d. 
Let us denote the set of rigid motions by 1Z. To define a rigid motion we need to specify a 
translation (which has two independent parameters) and a rotation (one parameter) thus, we 
can think of 1Z as a three dimensional space. Later on we will fix a concrete parametrization 
of 1Z (minus some points) as K 3 but for now it doesn't matter. For each a, b £ P we define 
the set 

L a ,b = {T eTZ \ T(a) = b} 

of rigid motions mapping a to b. This set is 'one dimensional' since, after specifying the image 
of a we can only change the rotation parameter. In our concrete parametrization (which we 
will see shortly) all of the sets L a ^ will in fact be lines in 1Z = M 3 . Let L = {L a ^ \ a, b £ P} 
be the set of \P\ 2 lines defined by the point set P. We would like to bound the number of 
distances defined by P, denoted d(P), as a function of the number of incidences between the 
lines in L. To this end, consider the following set: 

Q(P) = {(a, b, c, d) £P 4 \\\a-b\\ = ||e-d||}. 

A quick Cauchy-Schwartz calculation shows that 

IPI 4 

d(P)> 



\Q(P)\' 

On the other hand, since each 4-tuple in Q(P) gives an intersection between two lines in L, 
we have that 

\Q(P)\~\I(L)\ = \{(£,£>)eL 2 \en£> ^®}\. 

Thus, it will suffice to give a bound of < |P| 3 • log \P\ to obtain the bound of Guth-Katz on 
d(P). In general, |P| 2 lines in M 3 can have much more intersections but, using some special 
properties of this specific family of lines we will in fact obtain this bound. 

We now describe the concrete parametrization of 1Z mentioned above. We begin by 
removing from 1Z all translations. It is easy to see that the number of 4-tuples in Q(P) 
arising from pure translations is at most |P| 3 (since every three points determine the fourth 
one). Now, every map in 1Z is a rotation by 6 £ (0, 2ir) (say, to the right) around some fixed 
point / = (f x ,fy)- If T(a) = b then one sees that the fixed point / must lie on the line 
E a ,b — { c l Il c — a \\ = Il c — fr||} passing through the mid-point m = (a — b)/2 and in direction 
perpendicular to a — b. Our parametrization p :TZ \ {translations} i— > ]R 3 will be defined as 

p(T) = (/*,/„, (1/2) cot ((9/2)). 

We now show that 
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Claim 2.3.1. For each a,b E P we have that p{L a ^) is a line in R 3 . 

Proof. It will help to draw a picture at this point with the two points a, b, the line E a ^ and 
the fixed point / on this line. We will consider the triangle formed by the three points a, f 
and m = (a — b)/2 (the point on E a & that is directly between a and b). This is a right angled 
triangle with an angle of 6/2 between the segments fa and fm. Thus, (1/2) cot(#/2) = 4n^gjr. 
Let d = (d x , d y ) be a unit vector parallel to E a ^. We thus have that / = m + \\f — m\\ • d (or 
with a minus sign). Setting t = ||/ — m\\, this shows that 

p(L a ,b) = {(m x ,m y ,0) + t ■ (d x ,d y , \\a - &H" 1 ) | t € M} 

which is a line. □ 

Let N = \P\ 2 - We have N lines in R 3 and want to show that there are at most ~ N log N 
incidences. This is clearly not true for an arbitrary set of N lines. A trivial example where 
the number of incidences is iV 2 is when all lines pass through a single point. Another example 
is when the lines are all in a single plane inside M 3 . If no two lines are parallel we would 
have ~ N 2 incidences (every pair intersects). Surprisingly enough, there is only one more 
type of examples with ~ TV 2 incidences: doubly ruled surfaces. Take for example the set 
S = {(x,y,xy) | x,y G M}. This set is 'ruled' by two families of lines: the lines of the form 
{(x,y,xy) | x S M} and the lines of the form {(x,y,xy) \ y £ ]R}. If we take N/2 lines from 
each family we will get ~ iV 2 intersections. In other words, the set S contains a 'grid' of 
lines (horizontal and vertical) such that every horizontal lines intersects every vertical line. 
In general a doubly ruled surface is defined as a surface in which every point has two lines 
passing through it. A singly ruled surface is a surface in which every point has at least one line 
passing through it. It is known that the only non-planar doubly ruled surfaces (up to linear 
change of coordinates) is the one we just saw and the surface {(x,y,z) \ x 2 + y 2 — z 2 = 1}. 
There are no non- planar triply ruled surfaces. 

Guth and Katz proved the following: 

Theorem 2.3.2 (Guth-Katz). Let L be a set of N lines in M 3 such that no more than y/~N 
lines intersect at a single point and no plane or doubly ruled surface contains more than yN 
lines. Then the number of incidences of lines in L, \I(L)\, is at most < iV L5 • log N. 

The bound in the theorem is tight (even with the logarithmic factor) and clearly the 
conditions cannot be relaxed. Luckily, the lines L a ^ defined by a point set P in the above 
manner, satisfy the conditions of the theorem and so this theorem implies the bound on the 
number of distinct distances. An example of a set of lines matching the bound in the theorem 
is as follows: Take an S x S grid in the plane z = and another identical grid in the plane 
z = 1 and pass a line through every two points, one on each grid. The number of lines is thus 
N = S and a careful calculation shows that the number of incidences is ~ S 6 log S (see the 
appendix in Guth and Katz's original paper for the proof). 
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For each k let I>k(L) denote the set of points that have at least k lines in L passing 
through them. Define I = k{L) in a similar manner requiring that there are exactly k lines 
through the point. Theorem 2.3.2 will follow from the following lemma (which is also tight 
using the same construction as above). 

Lemma 2.3.3. Let L be as in Theorem 2.3.2. Then for every k > 2, 

N i.5 

\I>k(L)\ < 

Let us see how this Lemma implies Theorem 2.3.2. 

\I(L)\ = J> =fc (L)|-fc 2 

k=2 

= ]T(|/> fc (L)|-|/> fe+1 (L)|).fc 2 

k 

< 5>>*(L)|-A 

k 

k 

The case k = 2 and k > 3 of the Lemma are proved in [GKlOb] using different arguments 
(the case k = 3 was proved earlier in [EKS11]). Interestingly, the case k > 3 does not require 
the condition on doubly ruled surfaces and can be proven without it. 

We still need to show that the lines L a & coming from P satisfy the conditions of Theo- 
rem 2.3.2 (we omit the mapping p at this point to save on notations). To see that at most 
y/N = \P\ lines meet at a point observe that, if not we could find two lines L a ^ and L a y that 
intersect. This clearly cannot happen since this would imply that there is a rigid motion T 
mapping a to b and also mapping a to b'. Let us consider now the maximum number of lines 
in a plane. Let L a = {L a ,6 I b G P} and observe that the lines in L a are disjoint. Moreover, by 
the parametrization of the lines, we have that all lines in L a are of different directions. Thus, 
every plane can contain at most one line from L a . Thus, there could be at most \P\ = y/N 
lines in a single plane. The condition regarding doubly ruled surfaces is more delicate and 
can be found in the Guth-Katz paper. 

In the next few sections we will develop the necessary machinery for proving Lemma 2.3.3. 
Since the proof is quite lengthy we will omit the proofs of some claims having to do with 
doubly ruled surfaces that are used in the proof of the k = 2 case. As a 'warmup' to the 
full proof we will see two proofs of related theorems which use this machinery in a slightly 
simpler way. One of these will be yet another proof of the Szemeredi- Trotter theorem, this 
time using the polynomial ham sandwich theorem. The other will be the proof of the Joints 
Conjecture which uses the polynomial method. 
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2.4 The Polynomial Method and the Joints Conjecture 

The polynomial method is used to impose an algebraic structure on a geometric problem. 
The basic ingredient in this method is the following simple claim which holds over any field. 
Notice that the phrase 'non-zero polynomial' used in the claim refers to a polynomial with 
at last one non zero coefficient (over a finite field such a polynomial might still evaluate to 
zero everywhere). 

Claim 2.4.1. Let P C F n be a finite set, with ¥ some field. If \P\ < ( n ^ d ) then there exists 
a non-zero polynomial g £ ¥[xi, ■ ■ ■ , x n ] of degree < d such that g(p) = for all p G P. 

Proof. Each constraint of the form g(p) = is a homogenous linear equation in the coefficients 
of g. The number of coefficients for a polynomial of degree at most d in n variabels is exactly 



The second component of the polynomial method is given by bounding the maximum 
number of zeros a polynomial can have. In the univariate case, this is given by the following 
well-known fact. Later, we will see a variant of this claim for polynomials with more variables. 

Claim 2.4.2. A non-zero univariate polynomial g(x) over a field F can have at most deg(g) 
zeros. 

To illustrate the power of the polynomial method we will see how it gives a simple proof 
of the 'Joints Conjecture'. To this end we must first prove some rather easy claims about 
restrictions of polynomials. Let g 6 F[ degree d polynomial over a field F. 

Let S C F n be any affine subspace of dimension k. We can restrict g to S in the following 
way: write S as the image of a degree one mapping cf) : ¥ k i— > F n so that 



The restriction of g to S is the polynomial h(ti, . . . , t k ) = g{4>{t\, . . . , ifc)). This depends in 
general on the particular choice of <f> but for our purposes all </>'s will be the same (we can 
pick any (f). A basic and useful fact is that deg(/i) < deg(g) for any restriction h of g. 

Suppose now that i is a line in W 1 and write £ as t = {a + tb \ t € F} for some a, b G 
F. Restricting g to i we get a polynomial h(t) = g(a + tb). It will be useful to prove 
some properties of this restriction. In particular, we would like to understand some of its 
coefficients. The constant coefficient is the value of h at zero and is simply h(0) = g(a). On 
the other hand, the coefficient of highest degree t d will be gd(b), where gd{x\, . . . ,x n ) is the 
highest degree homogenous component of g (that is, the sum of all monomial of maximal 
degree in g). Another coefficient we will try to understand is that of t. To this end we will 
use the partial derivatives of g. Recall that dg/dx-i is a polynomial in F[xi, . . . , x n 1 obtained 
by taking the derivative of g as a polynomial in (with coefficients in ¥[xj,j ^ i]). This is 




□ 



S = {<p{h,...,t k )\t 1 ,...,t k e¥}. 
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defined over any field but notice some weird things can happen if F has positive characteristic 
(e.g, the partial derivative of x p is zero over ¥ p even though this is a non constant polynomial). 
The gradient of g is the vector of polynomials 

V(s) = (dg/dxi,. . .,dg/dx n ). 

This vector has geometric meaning which we will not discuss here. Algebraically, we have 
that the coefficient of t in the restriction hit) = g(a + tb) to a line is exactly (V(<?)(a), b). To 
see this, observe that the coefficient of t is obtained by taking the derivative (w.r.t the single 
variable t) and then evaluating the derivative at t = 0. Using the chain rule for functions 
of several variables we get that the derivative of h(t) is (V(g)(a + tb),b) and so the claim 
follows. 

2.4.1 The joints problem 

Let L be a set of lines in M 3 . A 'joint' w.r.t the arrangement L is a point p 6 I 3 through 
which pass at least three, non coplanar, lines. The basic question one can ask is 'what is the 
maximal number of joints possible in an arrangement of A lines'. This problem, raised in 
[CEG + 90a] in relation to computer graphics algorithms, was answered completely by Guth 
and Katz [GKlOa] using a clever application of the polynomial method. This result followed 
a long line of papers proving incremental results using various techniques, quite different 
than the polynomial method (see [GKlOa] for a list of references). This proof of the joints 
conjecture by Guth and Katz was the first case where the polynomial method was used 
directly to argue about problems in Euclidean space (in contrast to finite fields where it was 
more common). Later in [GKlOb], ideas from this proof were used in part of the proof of the 
Erdos-Distance problem bound. 

A simple lower bound on the number of joints os obtained from taking the L to be the 
union of the following three sets, each containing A 2 lines: 

L xy = {(i,j,t)\t€R}, i,j€ [N] 

L yz = {(t,i,j)\teR}, ije [N] 

L zx = {(i,t,j)\t€R], i,j€ [N] 

In other words, the set L contains ~ A 2 lines in a three dimensional grid. It is easy to check 
that every point in [A] 3 is a joint and so we have that the number of joints can be as large 
as |L| 3 / 2 . Not surprisingly (at this point), this is tight. 

Theorem 2.4.3 (Guth Katz [GKlOa]). Let L be a set of lines in R 3 . Then L defines at most 
< |L| 3 / 2 joints. 

Proof. The proof given here is a simplified proof found by Kaplan, Sharir and Shustin [KSS10] 
who also generalized it to higher dimensions. 
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Let J be the set of joints and suppose that \ J\ > C|L| 3//2 for some large constant C to be 
chosen later. We can throw away all lines in L that have less than | J|/2|L| joints on them. 
This can decrease the size of J by at most a half. 

Let g(x, y, z) be a non-zero polynomial with real coefficients of minimal degree that van- 
ishes on the set J. We saw in previous sections that deg(<?) < l^l 1 ^ 3 (since a polynomial of 
this degree will have > \J\ coefficients). 

Since each of the lines in L contains |J|/2|L| > deg(g) joints (here we take the constant 
C to be large enough) we get that g must vanish identically on each lines in L. To see this, 
observe that the restriction of g(x, y, z) to a line is also a degree < deg(g) polynomial and 
so, if it is not identically zero, it can have at most deg(<?) zeros. Thus, we have moved from 
knowing that g vanishes on all joints to knowing that g vanishes on all lines! 

Consider a joint p£l 3 and let ^1,^2^3 G L be three non coplanar lines passing through 
p. We can find three linearly independent vectors u\,U2,u^ £ M 3 such that for all i £ [3] 
we have ti = {p + tui\t £ R}. Since g vanishes on these three lines we get that for i £ [3], 
the polynomial hi(t) = g(p + tui) is identically zero. This means that all the coefficients of 
hi(t) are zero and in particular the coefficient of t which is, as we saw, equal to (ui, V(g)(p)). 
Since the Uj's are linearly independent, we get that V(g)(p) = for all joints p £ J. One 
can now check that, over the reals, a non zero polynomial g has at least one non-zero partial 
derivative of degree strictly less that the degree of g. Therefore, we get a contradiction since 
we assumed that g is a minimal degree polynomial vanishing on J. □ 

It is not hard to generalize this proof to finite fields using the fact that a polynomial 
g £ ¥[x\, . . . , x n ] all of whose partial derivatives are zero must be of the form f(x) p for 
some other polynomial /, where p is equal to the characteristic of the field. For other 
generalizations, including to algebraic curves, see [KSS10]. 

2.5 The Polynomial Ham- Sandwich theorem 

One of the main ingredients in the proof of the Guth-Katz theorem is an ingenious use of 
the polynomial Ham-Sandwich theorem, originally proved by Stone and Tukey [ST42]. This 
is a completely different use of polynomials than the one we saw for the Joints problem and 
combines both algebra and topology. We will demonstrate its usefulness by seeing how it can 
be used to give yet another proof of the Szemeredi- Trotter theorem in two dimensions. This 
proof will be both 'intuitive' (not 'magical' like the crossing number inequality proof) and 
simple (without the messy technicalities of the cell partition proof we saw) . 

The folklore Ham-Sandwich theorem states that every three bounded open sets in M 3 can 
be simultaneously bisected using a single plane. If we identify the three sets with two slices 
of bread and a slice of ham we see the practical significance of this theorem. More generally, 
we have: 

Theorem 2.5.1 ([ST42]). Let Ui,...,U n C W 1 be bounded open sets. Then there exists a 
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hyperplane H = {x G W 1 \ h(x) = 0} (with h(x) a degree one polynomial in n variables) such 
that for each i G [n] the two sets Ui n if + = {x £ f/j| /i(x) > 0} and U n -ff - = {x G 
Z7i | h(x) < 0} /iai>e egtza^ volume. In this case we say that H bisects each of the U 's. 

This can be thought of as extending the basic fact that for every n points there is a 
hyperplane in R n that passes through all of them. The proof uses the Borsuk-Ulam theorem 
from topology, whose proof can be found in [Mat07]. 

Theorem 2.5.2 (Borsuk-Ulam [Bor33]). Let S n C M n+1 be the n-dimensional unit sphere 
and let f : S n i— >■ W 1 be a continuous map such that f(—x) = —f(x) for all x G S n (such a 
map is called antipodal). Then there exists x such that f(x) = 0. 

Proof of the ham-sandwich theorem: Each hyperplane corresponds to some degree one poly- 
nomial h(x) = ho + h\X\ + . . . + h n x n . Since we only care about the sign of h we can 
scale so that the coefficients form a unit vector = (ho, h\, . . . , h n ). We define a function 
/ : S n R n as follows 

f(v h ) = (\H + nu i \-\H-nu i \) ie[n] . 

It is clear that / is continuous and that f(—x) = —f(x). Thus, there exists a zero of / and 
we are done. □ 

In their original paper, Stone and Tukey also observed that if we want to bisect more 
sets, we can do it as long as we have enough degrees of freedom. One way to allow for more 
degrees of freedom is to replace a hyperplane with a hypersurface. 

A hypersurface is a set H = {x G K n | h{x) = 0} where now h can be a polynomial of 
arbitrary degree d. The degree of H is defined to be the degree of its defining polynomial 
(we will abuse this definition a bit and say that a hypersurface has degree d if it has degree 
bounded by d). Recall that, if we have t < i^Y) P° ints in K ™ than 

we can find, by 

interpolation, a non-zero degree d polynomial that is zero on all of them. For the problem of 
bisecting open sets the same holds: If the number of sets is smaller than ( n ~^ d ) we can find a 
degree d polynomial that bisects all of the sets. 

Theorem 2.5.3 (Polynomial Ham-Sandwich (PHS)). Let U\,...,Ut G W 1 be bounded open 
sets with t < ( n ^ d ) • Then there exists a degree d hypersurface H that bisects each of the sets 
U i} i G [t]. 

Proof. The proof is identical to the degree one proof. Identify each degree d hypersurface 
with its (unit) vector of coefficients and apply the Borsuk-Ulam theorem on the function / 
mapping to the differences. □ 
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2.5.1 Cell partition using polynomials 

The PHS theorem gives a particularly nice way to partition M. n into cells. In addition to 
having a 'balanced' partition (as we had in the cell partition method we saw earlier) we will 
have the additional useful property that the boundaries of the partition are defined using a 
low degree polynomial. The use of the PHS for cell partition originated in a paper of Guth 
[Gut08] on the multilinear Kakeya problem. 

The first step for obtaining the cell partition theorem is to take the PHS to the 'limit' 
and replace the open sets with discrete sets. If S C M. n is a finite set and H is a hypersurface, 
we say that H bisects S if both sets S n H~ and SdH + have size at most \S\/2. Notice that 
this definition allows for an arbitrary number of points from S to belong to the set H itself. 

Lemma 2.5.4 (Discrete PHS). Let Si, . . . , S t C W 1 bet finite sets of points with t < ( n ~j J d ) . 
Then, there exists a degree d hypersurface H that bisects each of the sets Si,i £ [t]. 

Proof. Consider e-neighborhoods U\,...,Ut of the sets Si,..., St and apply the PHS on 
this family of open sets obtaining a bisecting hypersurface H e . Taking e to zero and using 
the compactness of the unit sphere we get that there is sequence of bisecting hypersurfaces 
converging to some degree d hypersurface h. If one of the sets Si n H + or Si n H~ have size 
larger than |5j|/2 we could find a h.s h e that does not bisect the e- neighbor hood of Si. □ 

Notice that, if n, the dimension, is fixed and the number of sets t grows, we always have 
a degree d = O n {t l / n ) polynomial that bisects t sets. In particular, over M?, a family of t 
discrete sets can be bisected by a degree ~ \ft h.s. 

We will now use the discrete PHS to get our final cell partition theorem. We will only 
need this theorem over M 2 and M 3 but will state it over W 1 for all n (it will help to think of 
n a fixed constant and of t as growing to infinity). 

Theorem 2.5.5 (Polynomial Cell Partition). Let S C W 1 be a finite set and let t > 1. Then, 
there exists a decomposition ofM. n into 0(t) cells (open sets) such that each cell has boundary 
in a hypersurface H of degree d = O n (t 1//n ) and each cell contains at most \S\/t points from 
\S\. Notice that the cells do not have to be connected. 

Proof. We will apply the discrete PHS iteratively to obtain finer and finer partitions. Initially, 
we get a h.s Hi of degree d\ < O n {l l l n ) that bisects the single set \S\ into two sets of size 
at most \S\/2 each (plus some points on the boundary). Applying the discrete PHS again on 
these two sets we obtain a degree cfo = O n {2 l / n ) h.s H2 that bisects both sets. This gives four 
cells (with boundary in the h.s Hi U H2 wich has degree at most di + cfo since its defined by 
the product of polynomials defining each h.s) with at most \S\/4 points in each. Continuing 
in this fashion I = log 2 t times we obtain I hypersurfaces Hi , . . . ,Hi with Hj having degree 
O n (2^ n ) and such that their union H = Uj e <mHj, gives a partition into cells containing at 
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most \S\/t points each. The degree of H is bounded by the sum 

I 

Y J o n (yi n ) = o n (t 1 ' n ). 

3=1 

□ 

2.5.2 Szemeredi- Trotter using polynomials 

Using the Polynomial Cell Partition theorem, we either get a 'balanced' partition of most of 
the points into disjoint cells or there is a large number of points that lies on a low degree 
hypersurface (hence, they possess an algebraic structure). Kaplan, Matousek and Sharir 
[KMS11] used this argument to give another proof of the Szemeredi- Trotter theorem which 
we will now see. 

Before we can start the proof we need to prove a very simple algebraic claim that we will 
use in the 'algebraic' case (when many points are on the h.s). In general, the polynomial 
method always requires some additional algebraic claims that depend on the specific problem 
(e.g., for the joints problem we needed to look at the coefficients of the restriction to lines 
and express them using the gradient). In this case we can prove what we need in a few lines. 
In other cases, we will rely on more powerful theorems from algebraic geometry. 

Claim 2.5.6. Let H C K 2 be a h.s of degree d. Then 

1. For every line I C ffi 2 not contained in H we have \£ fl H \ < d. 

2. There are at most d lines contained in H . 

Proof. Let h(x, y) be a polynomial of degree < d defining H. Let I be a line and consider 
the restriction of h to I. As we already discussed, this is a univariate polynomial of degree 
at most deg(/i) and so, if it is not identically zero, it can have at most deg(/i) zeros. 

To prove 2, suppose that there were d + 1 lines £±, . . . ,£d+i distinct lines contained in H. 
A generic line 1 t will (a) not be contained in IT and (b) will intersect each of the d + 1 lines. 
This will contradict 1 and so 2. is proved as well. □ 

Later we will see a more general statement of this form known as Bezout's theorem. 

We can now give the proof of the Szemeredi- Trotter theorem using polynomial cell parti- 
tion. Let P, L be our sets of points/lines in M 2 . As in previous proofs we will use the Cauchy 
Schwartz bound(s): 

\I{P,L)\<\P\\L\ 1 I 2 + \L\ 

We use the term 'generic line' to mean 'any line outside some set of measure zero'. More accurately, if 
we parametrize lines as vectors of coefficients defining them, a generic line is any line outside some fixed set 
of zeros of some polynomial. Over the reals one can simply take a 'perturbed' line. We can also use the word 
generic for other objects such as hypersurfaces, sets of points etc. with the same meaning. 
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and 

\I{P,L)\<\L\\P\ l ' 2 + \P\. 

We can also assume that |P| X / 2 << |L| « \P\ 2 (otherwise the theorem follows from the CS 
bound). 

We will apply the polynomial cell partition theorem with parameter t to be chosen later 
to obtain a hypersurface H of degree d = 0(y/t) and a family of disjoint cells Ci, . . . , Cj such 
that ]R 2 is the disjoint union of the cells and of H. For each i £ [t] let Pi denote the set PCiCi 
and let Lj denote the set of lines in L that intersect the cell Cj . We also define Pq to be the 
set of points in P n H and Lq to be the set of lines in L that intersect H. 

We thus have: 

t 

\I{P,L)\ = \I(P ,L Q )\+Y J \m,L l )\. 

i=l 

We will use different arguments to bound each of the terms on the r.h.s. The sum of incidences 
for i > can be bounded as follows. We have for each i £ [t], \Pi\ < \P\/t. So, applying the 
CS bound on each cell we obtain 

\I(Pi,Li)\ < (|P|/i) • + | Li |. (2.8) 

Since each line in Lj is not contained in H we have, by Claim 2.5.6, that it can intersect at 
most d = 0(y/t) cells (since it must cross H when it moves from one cell to another). This 
gives a bound 

EN<* 1/2 |£|, 

i=l 

which, using Cauchy-Schwartz, gives 

E^i 1/2 ^ 3/4 i l i 1/2 - 

Combining the above we get 

£ \I(Pi,Li)\ < (\P\/t) ■ t^Ll 1 / 2 + tV 2 \L\ = i-VVplVa + t^Ll (2.9) 

i=l 

To bound \I(Pq, Lq)\ we first split Lq into two sets: L' containing lines that are contained 
in H and Lq containing lines that are not contained in H (but intersect it at some point). 
By Claim 2.5.6 we have |J(Po>Lq)| < t 1 ^ 2 !^! since each line in Lq can intersect H in at most 
d ~ \/i points. We also have \L' \ < d ~ t 1 / 2 and so, using the CS bound we have 

\I(P Q , L' )| < iV2|p|l/2 + \p\ < t l/2\ L \ + |p|. 
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Combining the above we get 



\i(p,l)\ ^t^/^pim 1 / 2 +t i / 2 \L\ + \p\. 

Setting 

|p|4/3 

t ~ - — ! 

|L|2/3 

gives the Szemeredi- Trotter theorem. We have t > 1 since \L\ « \P\ 2 . This completes the 
proof □. 



2.6 The Guth-Katz incidence theorem for lines in R 3 

We have now developed enough machinery and intuition to start discussing the proof of the 
Guth-Katz theorem regarding incidences of lines in M 3 . Recall that the statement we are 
trying to prove is: 

Theorem 2.6.1 ([GKlOb]). Let L be a set of N 2 lines in M 3 such that no more than N lines 
intersect at a single point and no plane or doubly ruled surface contains more than N lines. 
Then the number of incidences of lines in L, \I(L)\, is at most < iV 3 • logiV. 



Also recall that we argued that this Theorem will follow from the following estimate on 
the sets />&(£) of points that have at least k lines in L passing through them: 

Lemma 2.6.2. Let L be as in Theorem 2.6.1. Then for every k > 2, 

N 3 

\I>k(L)\ < - w - 



We will prove Lemma 2.6.2 first for k > 3 and then for k = 2 (using different arguments). 
Since the statement is asymptotic we can actually separate into the two cases when k is either 
larger than some big constant C or smaller than C (the case k < C will only use the fact 
that at least two lines meet at a point). 



2.6.1 The k > 3 case 

This case of the lemma does not require any conditions on doubly ruled surfaces and so we 
only assume that no plane contains more than iV lines in L. We can also assume that k < N 
since each point has at most iV lines through it. 

The high level idea is as follows: Using the polynomial cell partition theorem, we can 
partition the points in />fc(L) into cells whose boundary is a low degree surface. We then 
separate into two cases: the cellular case and the algebraic case. The cellular case is when 
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most of the points are inside the interior of the cells. In this case we will use the 'weak' three 
dimensional Szemeredi- Trotter theorem (meaning, the ST theorem one gets from projecting 
everything to a generic plane) in each cell and sum up the resulting bounds. This case is 
very similar to the cell partition proof of the ST theorem we saw. The second, and harder, 
case is when most points are on the algebraic surface. The proof in this case is similar to the 
proof of the joints conjecture with the added difficulty that some intersections are planar. In 
the algebraic case we will argue using a degree argument that the surface must contain many 
of the lines in L (those lines with many points on them). We will then use the assumption 
k > 3 to argue that these lines are 'special' in some concrete sense and that a surface that 
contains too many 'special' lines must contain a plane and this plane must contain many of 
the lines (contradicting our assumption). In this last part of the proof we will also need to 
distinguish between points that have 3 non-coplanar lines through them and points through 
which there are 3 planar lines. 

Since the full proof requires some careful book-keeping we will make some simplifying 
assumption along the way. These will usually be benign and can be removed easily by simple 
averaging arguments. To begin, we assume the following two 'regularity' assumptions: 

1. Every point in I>k(L) has at most 2k lines in L passing through it. (To remove this 
assumption we need to argue about each interval [2 l , 2* +1 ] and sum the results). 

2. Each line in L is incident to at least > !£>|Cplj; \[ nes _ This is the 'average' number 
of lines incident to a point and so many points will have at least some fraction of this 
number of lines passing through them. 

Let S = I>k(L) be the set whose size we wish to bound. Suppose |5| > C ■ w for some 
large constant C to be specified later. We will use the cell partition lemma obtained from 
the polynomial Ham-Sandwich theorem (stated here for K 3 ): 

Theorem 2.6.3 (Polynomial Cell Partition). Let Scl 3 be a finite set and lett> 1. Then, 
there exists a decomposition o/R 3 into < t cells (open sets) such that each cell has boundary 
in a hypersurface H of degree d < i 1 / 3 and s.t each cell contains at most \S\/t points. 

We will apply this theorem and choose the parameter t so that the hypersurface H will 
be of degree d = [3 ■ (N/k)~\. We can assume k « N since otherwise the bound we are 
trying to prove is trivial. This guarantees that d > 1. This means that each cell contains at 
most < I'S'l/rf 3 points and each line passes through at most d cells (since crossing between 
cells means intersecting H and a line not contained in H can intersect it in at most deg(H) 
points as we already saw). Let Sh = S (1 H and let Sc = S \ Sh- Clearly, one of these sets 
will have size at least |5|/2. We begin with the case \Sc\ > \S\/2. 
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The cellular case 

Assume \Sc\ > | jS' | /2 . We will use the following easy corollary of the Szemeredi- Trotter 
theorem. We already used this corollary in dual form (with lines replaced with points) when 
we proved Beck's theorem. Even though we proved this bound in the plane R 2 , the same 
statement holds in three dimensions using a generic projection to a plane (which preserves 
intersections). 

Corollary 2.6.4. Let P and L be sets of points and lines in R 3 . For k > 1 let denote the 
set of points in P that have at least k lines passing through them. Then, 



Let Li denote the set of lines in L that pass through the i'th cell. Applying Corollary 2.6.4 
to each cell we get 

\ s \ ^so , ^ Wl^l 2 , \u 



^<\Sc\<E( l -rr+ l -ir (2-io) 



A; 3 k 

Observe that ^ \Li\ < d ■ \L\ since each lines passes through at most d cells. Also, from our 
first regularity assumption (each point has at most 2k lines passing through it), we get that 
maxj \Li\ < This implies that 

^ lrl2 lrl ^. rl \S\ ■ \L\ ■ 2k \S\-N 2 -2k 

i i 

Using these bounds in (2.10) we obtain 

151 \S\-N 2 -2 d-N 2 



2 ~ k 2 -d 2 k 

2|5| iV 3 |5| 

where the last inequality used the assumption that \S\ >> iV 3 //c 2 . This is a contradiction 
and so we conclude that we must be in the algebraic case. 



The algebraic case 

Observing the proof of the cellular case we see that we could also get a contradiction if 
\£>c\ > |5|/100 or any other small constant. This only requires taking d = D ■ (N/k) for a 
larger constant D (instead of D = 3). This means that we can also get \Sh\ > (1 — 
for any constant e. Taking e small enough and doing some averaging arguments (removing 
points not on H) we can actually reduce to the case where all points in S are in Sh so from 
now on we make this further simplifying assumption. 
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Thus, the situation is as follows: We have a set of points S with \S\ » N 3 /k 2 such that 
all points lie on a hypersurface H of degree d < N/k and such that through every point in 
S there are k > 3 lines in some set L of N 2 lines. The assumption k > 3 will come into play 
now when we analyze algebraic properties of H at the points in S. 

Choosing C large enough (so that IS*) > CN s /k 2 ) and using our second regularity as- 
sumption (that each lines has many points on it) we get that each line in L contains at 
least 

. \S\-k 

> 1 -^t- > 10d 

points in S C H on it (the constant 10 will be important later). This means, in particular, 
that all the lines in L are completely contained in H. Thus, each point in S has three lines 
in H passing through it. We will separate these points into 'critical' points, through which 
there are three non-coplanar lines (as in the joints problem), and to 'flat' points, which are 
non critical and through which there are three planar lines. We can also define 'critical' lines 
to be those lines that contain at least 5d critical points and 'flat' lines that contain at least 
5d flat points. Since each line has at least Wd points on it we have that each line is either 
critical or flat. We separate again into two cases depending on whether at least half the lines 
are critical or at last half are flat lines. 



At least N 2 /2 critical lines 

Recall the proof of the joints conjecture. We saw that if a surface H = {h(x, y, z) = 0} has 
three non-coplanar lines passing through a point p £ H then the gradient 

V h = (dh/dx,dh/dy,dh/dz), 

which is composed of three polynomials of degree < deg(/i), must vanish at p. Let jfi, jfe, jfe £ 
M.[x, y, z] denote the three components of the gradient (so that f\ = dh/dx etc.). Since they 
must vanish on all critical points, and since each critical line contains > 5d critical points, 
we have that /i, /2, must also vanish on all critical lines. Thus, the hypersurface H shares 
many lines with each of the three hypersurfaces Fi = {fi(x, y, z) = 0}. We would like to say 
that this cannot happen. For an arbitrary pair of hypersurfaces G\ = {gi(x,y, z) = 0} and 
G2 = {52(^5 V, z) = 0} there can be no bound on the number of lines contained in both since 
the two polynomials g\ and 52 might share a factor r(x, y, z) (that is, r divides both) such 
that the hypersurface R = {r(x,y,z) = 0} contains many lines (perhaps an infinite number 
of lines). Fortunately, this is the only case where this can happen. We will now prove this 
fact using the following classical result known as Bezout's theorem. 



Theorem 2.6.5 (Bezout). Let f(x,y),g(x,y) S be two polynomial without a common 

factor. Then, f and g have at most deg(f) deg(g) common roots. 



The proof of this result is not hard but requires some discussion of resultants which is 
beyond the scope of this survey. Using Bezout's theorem we can prove the following claim. 
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Claim 2.6.6. Let G\ = {g\(x,y,z) = 0} and G2 = {g2(x,y,z) = 0} be two hypersurfaces so 
that gi and 32 do not have a common factor. Then G\ n Gi can contain at most deg(g±) ■ 
deg(<72) lines. 

Proof. Take a generic plane A C M 3 and consider the restrictions gi(u, v) and g2(u,v) of 
9i,92 to this plane (u,v are new variables parametrizing the plane). It is not hard to show 
that, since 51,32 do not share a factor, the restrictions to a generic plane will also not have a 
common factor. This means that 31,32 can have at most deg(gi) deg(32) common roots. But 
a generic plane will intersect each of the lines contained in G\ n G2 in distinct points and so 
each line will add a common zero to g± , §2 ■ This shows that the number of lines is bounded 
by deg(3i)deg(3 2 )- □ 

To use Claim 2.6.6 we need to argue that h (the polynomial defining H) does not have a 
factor in common with its partial derivatives /1, /2, f%. This, however, can happen if h has a 
repeated factor. Namely, if we factor h into its irreducible components h = f| ■ Pj(x, y, z) a i 
then one of the ay's is at least 2. Such a polynomial will share a factor pj with each of its 
partial derivatives. In our case, since we are only interested in the set of zeros of h (and do 
not mind reducing the degree) we can assume w.l.o.g that h has no repeated factors (i.e., is 
square- free) . For square-free polynomials, one can easily show that h does not share a factor 
with at least one of the partial derivatives. This means that, using Claim 2.6.6, there are at 
most d 2 « N 2 /2 lines - a contradiction. This brings us to the last case: 

There are at least N 2 /2 flat lines 

This case is similar but will require us to use the assumption that no plane contains more 
than N lines. We saw that critical points are common zeros of some family of three low 
degree polynomials and that this family of polynomials cannot have a common factor with 
h. For flat points a similar statement holds but with 9 polynomials. Specifically: 

Claim 2.6.7. There are 9 polynomials tti, . . . , -Kg € R[x, y, z] of degree at most 3d each such 
that: 

1. Each flat point is a zero of all 9 polynomials Tti. 

2. If h(x,y, z) is 'plane-free' (i.e., if no irreducible factor of h is of degree one) then h 
does not share a factor with at least one of the polynomials 7Tj . 

We will not prove this claim here and just say that these 9 mysterious polynomials are the 
'second fundamental form' of the surface H and are related to the second order derivatives 
of h (or more precisely, to the quadratic approximation of H at a point). Since the degrees 
of the 7Tj's are bounded by 3d and since there are at least 5d flat points on each flat line we 
get that all flat lines are contained in the 9 hypersurfaces IL,- = {7Tj(x,y,z) = 0}. 
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We now write h(x,y,z) = h p (x,y,z) ■ h n (x,y,z), where h p contains all the 'planar' 
(degree one) irreducible components of h and h n is 'plane-free'. Thus, the hypersurface 
Hp = {h p (x,y,z) = 0} is the union of all planes contained in H. Using Claim 2.6.7 and 
Claim 2.6.6 we get that H and H n can share at most 5d 2 << N 2 lines and so there are 
many (> N 2 ) lines contained in H p (clearly, each line in H must be contained in one of the 
irreducible components). By a pigeonhole argument, and using the fact hat H p can have at 
most deg(-ff n ) < d degree one components, we get that at least > N 2 /d > N lines are in 
some plane. This is a contradiction and so the proof of Lemma 2.6.2 is complete for the case 
k > 3. 

2.6.2 The k = 2 case 

We now have to prove Lemma 2.6.2 in the case k = 2. Here we will eventually use the fact 
that at most N lines are in a doubly-ruled surface. Recall that a doubly ruled surface is a 
surface in which every point has two lines passing through it. A singly ruled surface is a 
surface in which every point has at least one line passing through it. We are not assuming 
anything on the number of lines contained in a singly ruled surface. It is known that there 
are only two examples (up to isomorphism) of doubly ruled surfaces, both of degree two. In 
order for us to not stray too far off our topic, we will state some facts about doubly (and 
singly) ruled surfaces without proof. The interested reader can find the missing proofs (or 
pointers to them) in [GKlOb]. 

Let us begin with the proof and denote again by S = I>2(L) the set of points of intersection 
of at least two lines in L. We will want to prove that \S\ < iV 3 so, for contradiction, suppose 
\S\ > C • N 3 for some large constant C to be chosen later. 

The proof uses again the polynomial method. This time, unlike the k > 3 case, we will 
jump straight to the 'algebraic case' and find a polynomial that vanishes on all lines in L. 
We already saw that a degree < i 1 / 3 polynomial can be found that vanishes on a given set of 
t points in M 3 . We wish to prove a similar statement with lines instead of points. The next 
claim does just that (we state it only for M 3 but a similar claim holds for any dimension and 
any field). 

Claim 2.6.8 (Simple Interpolation). Let l\, . . . ,£t be t lines in R 3 . Then there exists a non- 
zero polynomial of degree < 10 • t 1 / 2 that vanishes on all of the lines ti (i.e., the restriction 
of the 'polynomial to each of the lines is identically zero). 

Proof. A polynomial f(x,y,z) of degree 10 • t 1 / 2 has ( 10 * 3 +3 ) > 10t L5 coefficients. Each 
constraint of the form f\i. = (/ vanishes on l£) gives at most deg(/) + 1 < lOi 1 / 2 + 1 
homogenous linear equations in the coefficients of / (each coming from the vanishing of 
one of the coefficients of the univariate restriction to the line ti). Thus, we have enough 
coefficients to satisfy all the constraints in a non-trivial way. □ 
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Notice that this proof is very general and can work in many other settings (with lines 
replaced with almost any algebraic object you can think of). 

This claim is indeed simple but not very useful in our case. Applying it directly on the 
set L will give us a polynomial of degree d < 10iV that vanishes on all lines in L. For reasons 
that will become clear later, we will actually need a polynomial of much smaller degree (a 
small constant times N) to vanish on L. Fortunately, L is not an arbitrary set of lines (which 
would make this task impossible) but a set with many intersections. Since \S\ > CN 3 we 
know that a constant fraction of the lines have at least CN/10 points of intersection on them. 
Replacing C with C/10 and throwing away a small fraction of the lines we can assume w.l.o.g 
that each line in L has at least C ■ N distinct points of intersection on it, where C is some 
large constant to be chosen later. Using this additional structure we can prove the following 
improved version of the interpolation claim. 

Claim 2.6.9 (Interpolation using incidences). Suppose C is a large enough constant. Let 
L be a set of N 2 lines in M 3 such that each line in L intersects at least CN other lines in 
distinct points. Then, there exists a non-zero polynomial of degree d < Nj \[C that vanishes 
on all lines in L. 

Proof. Take a random subset L' of L by picking each line with probability 1/C. With high 
probability, each line in L (no typo, this is the original set L) will still intersect at least N/2 
lines in L' . Use Claim 2.6.8 to find a polynomial f(x,y,z) of degree lO^/jiTj < N/^/C that 
vanishes on V . We now observe that / must vanish also on L since the restriction of / to 
each line in L has at least N/2 > deg(/) zeros (when C is large enough). □ 

Let f(x,y,z) be a polynomial of degree d < N/y/C given by the last claim such that / 
vanishes on all lines in L. Let F be the hypersurface defined by /. Write / = JJ- fi(x,y,z) 
as a product of irreducible polynomials fi and recall that, w.l.o.g, / is square free and so no 
fi is repeated. Thus, F is the union of the hypersurfaces Fi defined by the different /j's. If 
we denote by di the degree of fi we have that d = Y2i <k ■ We now partition the irreducible 
factors into 4 groups. Let f p i be the product of all /j's that are of degree one (corresponding 
to FiS that are planes). Let fd r be the product of the doubly-ruled irreducible components. 
Let f sr denote the product of the singly-ruled components and Let f nr be the product of the 
non-ruled components. Also define F p i,Fd r ,F sr and F nr to be the hypersurfaces defined by 
these four polynomials. 

Since each line i 6 L is contained in F, it must be contained in one of the irreducible 
factors of F. The set of incidences S can be partitioned into incidences between lines in 
different factors and to incidences of lines that are in the same factor. A line t in a factor fi 
can intersect lines in factors not containing i in at most d points. This follows from the basic 
fact (that we proved in previous sections) that a line can have at most deg(H) intersections 
with a hypersurface H not containing it. Here we use this fact for H being the hypersurface 
defined by the product of the factors of / that do not contain the line I (which contains all 
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lines that do not share a factor with £). This means that the total number of incidences 
between lines in different factors is bounded by \L\d < A 3 . 

We thus have to consider only incidences between lines that are in the same factor. 
Consider first intersections of lines in the factors of F p i. Since there are at most TV" lines 
in each plane, we have at most iV 2 intersections inside each plane. Since there are at most 
d < N planar factors in F p i, the total number of incidences of this kind is bounded by 

< A 3 . The same argument precisely works also for intersections between lines in F^ r using 
the assumption that every doubly ruled surface contains at most N lines. 

We now consider intersections of lines in F sr . We will use the following fact from the 
theory of ruled surfaces: 

Claim 2.6.10. Let S C M 3 be a singly-ruled surface. Then, every line in S, with the exception 
of at most two lines, can intersect at most deg(S) other lines in S. 

In other words, if there are 3 lines in S that have more than deg(S') intersection with 
other liens in S, than S must be doubly ruled. Using this claim, we can bound the number 
of incidences in F sr by < A 3 as follows. Each singly ruled factor F{ C F sr can have two 
'exceptional' lines which can have at most \L\ = N 2 incidences each. This sums up to 
({components of F sr }\ ■ 2 A 2 < dN 2 < A 3 . Each ' non exceptional' line in a factor Fi of F sr 
can contribute at most deg(i ? j) = di < d intersections and so the total is again bounded by 
\L\d < A 3 . 

We are left with the task of bounding the number of intersections of lines in F nr . Suppose 
that there are more than AN 3 such intersection, where A is some large constant to be chosen 
later. We will use the following claim without proof: 

Claim 2.6.11. A non-ruled surface S C M 3 can contain at most deg(S) 2 lines. 

This means that F nr can contain at most d 2 < N 2 /C lines. Notice that we can pick A 
(which controls the number of incidences) and C to be as large as we want and so we can 
argue by induction on the problem of bounding the number of incidences of lines in F nr . 
That is, we can assume Lemma 2.6.2 (for k = 2) holds for (A — l) 2 lines and then use this 
assumption on the lines in F nr . This requires some careful choice of constants but can be 
carried out (we will not do this here). A delicate point is that we must satisfy the assumption 
that at most N — 1 lines in F nr are in any plane or doubly ruled surface. This can be achieved 
using the following argument: As long as there is a plane or doubly ruled surface containing 
more than A — 1 lines in V = L D F nr , remove these lines from L' . This can repeat at most 

< N times and so the intersections between lines we have removed are at most < iV 3 . The 
remaining lines are not contained in any of the planes of doubly ruled surfaces we removed 
and so have at most < A^ 3 intersections with removed lines (using the same argument we 
used before). This will result in a small decrease in the constants that can be ignored since 
we can choose C and A to be sufficiently large constants. 
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2.7 Application of the Guth-Katz bound to sum-product es- 
timates 

We saw how the Szemeredi Trotter theorem can be used to derive the sum product theorem 
over the reals showing that for every set AcM one of the sets A + A, A ■ A has size at least 
\A\ 5 / 4 . We will now see how the three dimensional incidence theorem proved by Guth and 
Katz can be used to give a similar estimate. Recall the GK bound: 

Theorem 2.7.1 (Guth-Katz). Let L be a set of N 2 lines in M 3 such that no more than N 
lines intersect at a single point and no plane or doubly ruled surface contains more than N 
lines. Then the number of incidences of lines in L, \I(L)\, is at most < iV 3 • log N. 



Recently, Iosevitch, Roche-Newton and Rudnev [IRNR11] used this theorem to prove: 



Theorem 2.7.2. Let A C R be any set. Then 

\ A .A-A-A\>^\- 
log \A\ 



(The same result also holds with the minus sign replaced by a plus). Notice that, unlike 
the previous sum-product estimates we saw, this bound is tight up to the logarithmic factor. 
This follows by taking A to be an arithmetic progression. 

To get a feeling why such a bound should follow from the result of Guth-Katz observe 
that the GK bound for the distinct distances problem (obtained from Theorem 2.7.1 in a 
black-box way) automatically gives that, for a set AcKwe have 

\{(a - bf + (c - df | a, b, c, d G A}| > -J^j. 

log \ A\ 

This is obtained by taking the set of points P = A x A in the plane and counting the distances 
defined by this set. This bound is also tight (up to logarithmic factors) for an arithmetic 
progression. To argue about A ■ A — A ■ A, however, we will need to use a slightly different 
reduction. Recall that the reduction from the distance problem to the incidence bound was 
obtained by considering the group of distance preserving linear mappings acting on the plane. 
To prove Theorem 2.7.2 we will need to consider mappings that preserve determinants (or 
areas). This is the group SL2(M) of 2 x 2 matrices with determinant equal to 1. Again, this 
is a three dimensional group and the trick will be to identify it with M 3 in a way that the 
mappings sending a point p to a point q form a line in ]R 3 (this will actually be simpler to 
show in this case). 

For two vectors v = (a, b) and u = (c, d) we denote det(u, u) = ad — be. Observe that for 
four vectors v, u, v' , u' , no two of which are multiples of each other, we have that det(t> , u) = 
det(v',u') iff there exists a map T G SL2(M) that sends v to v' and u to u'. One direction 
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is obvious. To see the other direction let v,u,v',u' be as above and observe that there is a 
unique T sending v to v' and u to u' . We now have that det(v',u') = det(T) • det(v,u) and 
so det(T) = 1 as required. 

Fix two vectors v = (a,b),v' = (c,d) £ M? that are not multiples of each other (i.e., 
det(v,v') / 0). Let L v y C SL/2(M) be the set of mappings with L(v) = v' . We wish to 
understand how this set looks like. If v = (1,0) and u = (0, 1) this is easy: 



£(i,o),(o,i) - { ( i t ) ' 1 ^ R } 



For general v = (a, b),v' = (c, d) we need to conjugate by the matrix taking v, v' to (1, 0), (0, 1). 
This gives the set: 



1 / a c \ / —1 \ ( d —c 



ad — be \ b d J \ 1 t J \ —b a 
Which gives the line 

1 / cd + ab — bet —c 2 — a 2 + act 

ad -be \ d 2 + b 2 - bdt -cd - ab + adt 



,t 6 



The lines L ViV > are contained in the three dimensional surface H = {(x\, X2, x$, x^xix^ — 
x 2%3 — 1} which lives in R 4 . We can project this surface to ]R 3 using the projection 
(x\,X2,X3). This projection is one-to-one as long as x\ ^ 0. Using a generic rotation around 
zero, we can assume that this projection preserves the structure of the finite set of lines we 
will be interested in (i.e., those coming from v,v' with both coordinates in A). 

Let P = A x A and let L = {L v y \v,v' G P} be our set of \P\ 2 lines. Following the 
Elekes-Sharir framework we define the set 

Q(P) = {{v,u,v',u) e P 4 | det{v,u) = det(v',u')}. 

Applying Cauchy-Schwartz we get that 

IPI 4 

\A- A- A- A\ = |{det(u,u)|u,u G P}\ > 



\Q(P)\ 

On the other hand, since each 4-tuple in Q(P) gives an intersection between two lines in L, 
we have that 

\Q{P)\~\i(L)\ = \{{i,l>)eI?\lnl'^$}\. 

Thus, it will suffice to give a bound of < |P| 3 • log|P|. This bound will follow from Theo- 
rem 2.7.1 if we can argue that the set L satisfies the conditions of the theorem. Ae before, the 
condition on at most \P\ lines through a single point follows from the fact that no mapping 
can map a single point to two distinct points. The two conditions on planes and doubly ruled 
surfaces can be verified from the explicit description of the lines in L given above. 

To prove the same statement for A ■ A + A ■ A observe that the size of Q{P) is the same 
in this case (since ad — be = a'd' — b'c' iff ad + b'c' = bc + a'd'). 
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Chapter 3 

Counting Incidences Over Finite 
Fields 



3.1 Ruzsa Calculus 

We begin developing the necessary machinery for proving a Szemeredi- Trotter type result 
over prime finite fields. This result, due to Bourgain, Katz and Tao [BKT04], says that a 
set of N lines and N points in Fp can have at most 0(N l ) incidences, where e is some 
positive real constant and N is not too large. We will discuss the precise statement of this 
theorem in Section 3.4 after we have developed some machinery from additive combinatorics 
in this and the following two sections. 

The first ingredient we will need is Ruzsa- Calculus [Ruz96a]. This is a set of small claims 
about additive structure in arbitrary abelian groups. The usefulness of this calculus will 
become clear in the following sections. 

Let G be an abelian group and let A, B C G be subsets. We already defined the sets 
A + B and A — B of sums/differences of elements of A. We can define k ■ A to be the set 
A + A + A + ...+A, /c-times. Be careful not to confuse this with the set {ka \ a G ^4} which 
always has size bounded by \A\. We will generally only work with finite subsets of G and 
so will omit the word 'finite' in all of our definitions/claims. We use the Cartesian product 
notation A x B = {(a, b) \ a G A, b G B}. 

We begin with a simple, yet powerful, lemma known as Ruzsa's triangle inequality. 

Lemma 3.1.1 (Triangle Inequality). Let A,B,C C G. Then 

\A\ ■ \B-C\ < \A-B\ ■ \A-C\. 

Proof. For every v G B — C we can define two functions / : B — C i— )■ B and g : B — C i— )■ C 
such that f(v) — g(v) = v (these are not defined uniquely, just pick some pair of values with 
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difference v). Consider the mapping 

<t> : A x (B - C) {A - B) x (A - C) 

given by 

4>(a,v) = {a- f(v),a- g(v)). 

Observe that cj) i injective since we can recover v from the difference between the two coordi- 
nates of the output. This implies the required bound on the set sizes. □ 

To explain the name of this lemma, consider the 'Ruzsa Distance' between two sets: 

The lemma just proved shows that d(B,C) < d(A, B) + d(A,C) which justifies calling this 
function 'distance' (though it is not a real distance function since d(A, A) might be non zero). 



Theorem 3.1.2 (Ruzsa Calculus). There exists an absolute constant c such that the following 
holds. Let A,B,C CG be such that \A\ = \B\ = \C\ = N. 

1. If\A + B\<K-N then \A - B\ < K c ■ N. 

2. If\A + B\<K-N then \ A + A\ < K c ■ N. 

3. If \A + B\, \A + C\ < K ■ N then \B + C| < K c ■ N. 

4. If\A + B\<K-N,\C + C\<K-N and \A n C| > K~ l ■ N then \C + B\ < K c ■ N. 

5. If\A + B\,\A + C\ <K-N then \A + B + C\ < K c ■ N . 

6. If \A + A\ < K ■ N then for all non-negative integers k,£ there exists c(k,£) such that 
\k ■ A — £ ■ A\ < K c ( k '^ ■ N , where c(k,£) does not depend on the group G or on the set 
A. 



Proof. Throughout the proof we will (ab)use the constant c freely and treat it as a 'generic' 
constant that can change from one line to another (all inequalities will eventually work if we 
pick c large enough). A cleaner way to do this would be to define A < B to mean A < K C B 
for some absolute constant c. 

We start with some useful notations (some of which will be familiar). Let 

Q(A, B) = {(a, b,a ,b') e A x B x A x B \ a + b = a + b'} 

and let S(x) = {(a,b) €AxB\a + b = x} and R(x) = {(a,b) €AxB\a-b = x}. Then 

\Q(A,B)\=J2\S(x)\ 2 = J2\ R &\ 2 - 

X X 
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Recall that, using Cauchy- Schwartz, we get that 

\Q(A,B)\-^\S(x)\ \A + W\ ~ ~\A + B\' 

Using the fact that Q(A, B) = Q(A, —B) we also get that 

\A\ 2 \B\ 2 

\Q(A,B)\ > 



A-B 



1. From 



we get 



\A\\B\ -rnaxl^)! > ^(z)| 2 > 

X ' ' 

\A\\B\ 



max > 



x 1 v " - \A + B\ 
Let xq be such that |JR(sco)| > fg+gj • We define the map 

: fl(xo) x(A-B)^(A + B)x(A + B) 

to be 

#((a,&),i/) = (f(v) + b,g(v) + a), 

where /, g are fixed functions on A-B such that f(v) £ A,g(v) G i? and /(v) 
Again, we can check that (ft is injective, which gives 

IAm <i«mi< |A + b|2 



Plugging in the bound \A + B\ < KN we obtain the bound \A — B\ < K 3 N. 

2. Using the triangle inequality (Lemma 3.1.1) and 1. we get 

|Bp - A\ < \B - A\\B - A\ < K c ■ N 2 . 

3. Similarly, 

(N/K C )\B + C\ < \A\\B-C\ < 
\A -C\\B-C\ < K C \A + C\\B + C\ < K C N 2 . 

4. Using the triangle inequality and the previously proved parts 

(N/K)\B - C\ < \A n C\\B - C\ < 
\AC\C - B\\AnC -C\ < \A-B\\C-C\ < K C N 2 . 
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5. Here we need to do some work. The main step is to find a set S of size roughly TV such 
that \S + (A + B)\ < K C N. Then we will have a bound on \(A + B) + (A + B)\, which 
will become a bound on \A + B + C| using 4. and the fact that some shift of C has large 
intersection with A (and so with A + B) . It is a good exercise to stop reading now and try 
to fill in the rest of the proof.. 

We will take S to be 

S = {x E G\\S(x)\ > N/10K}. 

Since J2 X \ s ( x )\ 2 > N A /\A + B\ > N 3 /K we must have \S\ > N/10K. Observe that each 
element x + (a + b) E S + (A + B) has at least N/WK distinct representations as a sum of 
the form 

x + a + b = (oj + a) + (6, + b) 
with a, Oj G A, b,bi E B s.t a, + 6« = x. This means that 

\S+(A + B)\ < (WK/N)\A + A\\B + B\ < K C N. 

We now use part 2. to obtain 

\(A + B) + (A + 5)1 < K C N. 

Since \A — C\ < K C N there exists an element x E G that can be written in at least N/K c 
ways as a difference x = a — c with a £ A,c £ C. This means that |(C + x)n^4| > N/K c . This 
implies D (A + B)\ > N/K c . We also know that \{C + x) + (C + x)\ < \C + C\ < K C N 

(since \B + C| < N) and so, Using 4. we get 

\{C + x) + (A + < K c iV. 

which gives \A + B + C\ < K C N . 

6. This follows immediately from a repeated application of 5. 

□ 

3.2 Growth in ¥ p 

Let ¥ = Fp be a finite field of prime cardinality. Recall that such fields do not contain any 
subfields. For simplicity, we will talk about prime fields but all of our arguments can be 
extended to fields not containing large subfields. Our goal in this section will be to show that 
for a set A C F and for almost all values of A E F we have \A + XA\ >> \A\. Here we denote 
by XA = {\a \ a E A} (do not confuse this with k ■ A used for iterated sums of elements in 
A). Later we will need to develop a 'distributional' variant of the same statement. 
Our first step is to show that there is at least one good value of A. 
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Lemma 3.2.1. Let A C F then there exists A G F suc/i i/iai 

|A + A^| > - ■min{|A| 2 ,p}. 

Proof. We will use the familiar notation Q(A,B) for the set of quadruples a + b = a' + b' 
with a, a' G ^4 and 6, 6' G £?. Recall also that 

l«(AB)l>ffl 

Summing over A we get 

5^ |Q(i4,AA)| = {(01,02,03,04, A) G A A x F* I ai + Aa 2 = a 3 + Aa 4 .}. 

A^O 

The solutions with a\ = 03 and a 2 = 04 contribute at most |A| 2 (p — 1) to this sum (since A can 
be anything). The solutions with (01,02) 7^ (03,04) determine a unique A and so contribute 
at most |^4| 2 (|A| 2 — 1). Over all we have 

^2\Q(A,\A)\<\A\\p-l) + \A\\\A\ 2 -l). 

A^O 

This means that there exists Ao for such that |Q(^4, Ao^4)| < \A\ 2 + |j4| 4 /(p— 1). This implies 
the required bound on \A + Ao-A|. □ 

Let 

Stabx(A) = {A G F* I \A + XA\ < K\A\}. 
The next lemma shows that the set Stab/ S -( J 4) behaves somewhat similarly to a sub field. 

Lemma 3.2.2. There exists an absolute constant c such that: 

1. If A G Stabx(A) then — A, 1/A are in StabK c (A). 

2. If Ai,A2 G Stabx(A) then AiA 2 ,Ai + A 2 are in Stabx c {A). 

Proof. The proof is an immediate application of Ruzsa Calculus (see last section): 1. The 
claim about —A follows from Ruzsa calculus. The claim about 1/A is trivial since |^4 + \A\ = 
\{1/X)A + A\. 2. Using Ruzsa Calculus we have 

\A + (Ai + X 2 )A\ <\A + XiA + X 2 A\ < K C \A\. 

To argue about the product observe that \A + Ai^4| < K\A\ and \A + (1/A 2 )A| < K\A\ so, 
by Ruzsa, we have 

\A + {X X X 2 )A\ = \X X A + {l/X 2 )A\ < K C \A\. 

□ 
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We would like to argue that, if Stabft-(A) is large, then for some c, Stable (A) contains 
all of F (contradicting Lemma 3.2.1). This will be obtained by the following lemma. 

Lemma 3.2.3. Let A C F then 

\Z-A 2 -3- A 2 | > imin{|^| 2 ,p}, 

where 

3-A 2 -3-A 2 = A-A + A-A + A-A-A-A-A-A-A-A. 
Proof. Observe that, if A then \A + XA\ = \A\ 2 . We divide the proof into two cases: 

Case 1: ^ ^" ^ n tms case there rnust exist A G sucn ^hat A + 1 (here 

we use the particular structure of the field F p ). This implies \A + (A + l)A\ = \A\ 2 . Write 
A = We have 

0-2 — 14 

(a 2 - a 4 )(A + (A + I) A) C (a 2 - a 4 )A + (ai - a 3 + a 2 - a 4 )A C 3 • A 2 - 3 • A 2 . 
And since the size of the set on the left is \A\ 2 we are done. 

Case 2: ^-A = F. The, from Lemma 3.2.1 we have that there exists A G such that 
\A + XA\ > \ min{|A| 2 ,p}. Write A = Then 

(A + XA)(a 2 - a 4 ) C 3 • A 2 - 3 • A 2 
and the bound follows also in this case. □ 

An immediate corollary of Lemma 3.2.3 is the following 

Corollary 3.2.4. Let A C F be of size p . Then | k ■ A k - k ■ A k \ =¥ for some k = k(5). 

Proof. Applying Lemma 3.2.3 gets us all the way up to size p/2. To make the final jump 
observe that, If |A| > p/2 (p is odd!) then A n (x — A) ^ for all x G F. This means that 
A + A = F and so one more addition will finish the job. □ 

Combining the above we get to our goal: 

Theorem 3.2.5. Let A,T C F with p a < \A\ < p l ~ a and \T\ > p$ . Then there exists A G T 
such that \A + XA\ > \A\ l+ < a ^ , where c(a,/3) > is a constant depending only on a and (3. 

Proof. We let K = \A\ c ( a, P\ If the theorem is not true than T C Stabif(A) which implies 
that taking some k = k(a,f3) sums and products (as in Corollary 3.2.4) we will have F = 
Stab a-/ (A) with K' < K k ^\ Picking c(a,/3) small enough we get that for all A G F, 
\A + AA| < K'\A\ « min{|A| 2 ,p}. This contradicts Lemma 3.2.1. □ 
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Our next goal will be to prove a 'distributional' version of Theorem 3.2.5. 



3.3 The Balog-Szemeredi-Gowers theorem 

In the previous section we showed: 

Theorem 3.3.1. Let A,T C F with p a < \A\ < p x ~ a and \T\ > . Then there exists A G T 
such that \A + \A\ > \A\ 1+c ( a 'P\ where c(a,(3) > is a constant depending only on a and ft. 



Lets try to see why this is the kind of result we could hope to use to prove the ST theorem 
and why it is not really strong enough. We will demonstrate this by considering a very special 
case of a line/point arrangement. Let P and L be sets of N points and iV lines in F 2 with 
N ~ p s for some 'nice' 5 (say, between 1/4 and 7/4). Suppose also that P = P x x P y with 
\Py\ ^5 A fl / 2+e and that the lines in L are given by equations of the form Y = aX + b 
with a,b G A and \A\ < N 1 / 2 ^. Then, if \I(P,L)\ > N 3 / 2 ~ e , then, for at least N 1 ' 6 lines 
Y = aX + b there will be at least N l / 2 ~ e values of x G P x for which ax + b G P y (this 
is the 'typical' value required to obtain iV 3 / 2 ~ e intersections). This means that A + xA is 
'small' (contained in P y ) for many values of x. This would contradict Theorem 3.3.1 if the 
information was complete (i.e., if we knew that for all a,b G A and x G P x , a + xa G P y ). 
However, the information is given in an incomplete form, as a quantitative incidence bound, 
and so we need a stronger version of Theorem 3.3.1 that can handle such information. 

The idea is to work with Q(A, B) instead of \A + B\. Recall that 

,„ nl l^| 2 |£| 2 
A + B > 1 11 1 



\Q(A,B) 
\A\ 2 \B\' 



Thus, if we define 

we have max{|A|, \B\} < E(A, B) < \A + B\. We will call the quantity E(A, B) the additive 
energy (or just energy) of A + B, as it relates to the £2 norm of the distribution obtained 
by sampling a, b independently in A, B and summing them 1 . As an example, consider an 
arithmetic progression A of size N and notice that, in this case, both \ A + A\ and E{A, A) are 
bounded by < N. Now, let B be a set of size N with \B+B\ = (l/2)\B\(\B\-l) (i.e., a set with 
no dependencies). Here we also have \B + B\ ~ E(B,B) ~ \B\ 2 . However, taking C = AU B 
we get that \C + C| > \B\ 2 > \C\ 2 but E{C,C) < N (because \Q{C,C)\ > \Q(A,A)\). 
That is, the energy E(A, B) can capture information about sufficiently large subsets of A 
(or B) that do not grow in addition - information that is not captured by \A + B\. A 



We normalize E(A, B) so that it is in the same scale as \A + B\. In some texts other scalings are used, 
e.g. in [Gre09] the scaling is so that E(A,B) is in the range [0, 1]. In some places, the term additive energy 
is used for the quantity Q(A,B). 
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partial converse to this statement is given by the following important result known as the 
Balog-Szemeredi-Gowers Theorem (or BSG for short). 

Theorem 3.3.2 ([BS94, Gow98]). Let A,B C G be sets of size N in an abelian group 
G. Suppose that E(A,B) < KN. Then, there exist subsets A' C A and B' C B with 
l^-'l, l-B'l > N/K c and with \A' + B'\ < K C N . Here, c> is some absolute constant. 



This theorem will allow us (with some work) to derive an additive energy version of 
Theorem 3.3.1 with \A + A^4| replaced by E{A,\A). The BSG theorem will actually follow 
from a relatively generic graph theoretic lemma which we now state. 

Lemma 3.3.3. Let H C V x U be a bipartite graph with \V\ = \U\ = N . Suppose \H\ > aN 2 
(the number of edges). Then, there are subsets V C V and U 1 C U with \V'\,\U'\ > a c N 
and such that for all v G V' , u G U' there are at least a c N 2 paths of length three between v 
and u. 



Before proving Lemma 3.3.3, lets see how it implies the BSG theorem. 



Proof of Theorem 3.3.2: Suppose E(A,B) < KN. Then \Q(A,B)\ > N 3 /K. Let P be 
the set of values x with \R(x)\ = \{(a, b) £ A x B \ a - b = x}\ > N/2K. This is the set of 
'popular differences' and can be seen to have size at least \P\ > N/2K (we saw this argument 
last time). Consider the graph H C A x B whose edges corresponds to pairs (a,b) with 
a — b G P and label each such edge (a, b) with the value a — b. From the definition of P we 
have that H has at least \P\{N /2K) > (1/AK 2 ) • N 2 edges and so, applying Lemma 3.3.3, we 
have subsets A' C A and B' C B with \A'\, \B'\ > N/K c and such that, for all (a, b) G A' x B' 
there are at least K C N 2 paths of length three between a and b in the graph H. Consider 
such a path a — > b' — > a' — > b. Writing 

a - b = (a - b') - (a' - b') + (a' - b) 

and using the fact that all three differences in this sum are popular, we see that a — b can 
be written in at least a c N 2 distinct ways as a — b = x\ — X2 + X3 with x\, X2, X3 G P. This 
implies 

I PI 3 

\A' - B'\ < -^-^ < u c 'N 

for some other absolute constant d > 0. Going from \A' — B'\ to \A' + B'\ is possible using 
Ruzsa Calculus and loses another constant. □ 

To prove Lemma 3.3.3 we first prove a simpler lemma on paths of length two. We will 
denote by T(S) the set of neighbors of S in the graph H, where S is some subset of the 
vertices of either V or U. 
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Lemma 3.3.4. Let H C V x U be as in Lemma 3.3.3. Then, for every e > there exists a 
set V' C V with \ V'\> (a/2)N and s.t 



\{(vi,v 2 ) EV'x F / ||r(?; 1 )nr( V2 )| < (ea 2 /2)N}\ < e\V'\ 2 . 

Ln other words, for all but an e fraction of the pairs of vertices in V' , the pair will have at 
least (ea 2 /2)N common neighbors (or paths of length two). 

Proof. The proof uses a clever trick introduced by Gowers which combines a 'somewhat' 
random choice of the set V . Picking the set V completely at random does not seem to work. 
The idea is to chose V as the set of neighbors T(u) of a random vertex u € U. This makes 
sense, since a pair (i>i,i>2) with few common neighbors are less likely to be in V' than a pair 
that has many common neighbors. Lets see the calculation. 
Denote the set of 'bad' pairs by 

S = {(vi,v 2 ) £ V x V | |r(vi) n r(«2)| < (ea 2 /2)iV}. 

For each u £ U let S u = S fl T(u) denote the set of bad pairs among the neighbors of u. 
Suppose we pick u at random and consider first the expectation of |r(u)| 2 (the total number 
of pairs among neighbors of u). Using Cauchy-Schwartz we have: 



Ejr(«)| 2 ] > (Ejr(u) 



a 2 N 2 . 



We also have 



E«[|5 U | 



E, 



r(«i) n r(«2)| 



V 1,V2 



Vl,V2 
t2 /,„2 



N 



< N z ■ (ea72). 
Combining the two bounds we get 

¥. u [e\T{u)\ 2 -\S u \] >(ea 2 /2)N 2 . 
This implies \T(u)\ > (a/y/2)N and \S U \ < e\T(u)\ 2 as was required. 



□ 



Proof of Lemma 3.3.3: We will omit some of the detailed calculations (which can be 
easily filled in). By throwing away a small fraction of the vertices we can reduce to the 
case where the minimum degree of a vertex is at least (a/2)N. Let V' C V be given by 
Lemma 3.3.4 so that \V'\ > aN and such that for all but e|V^'| 2 pairs (^1,^2) G V we have 
|r(ui)nr(u2)| > ea 2 N (we will pick e later). Notice that there might be some vertices v\ G V 
for which there are many (even all) vertices v 2 £ V that have few common neighbors with 
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them. We can, however, find a subset V" C V with \ V"\ ~ \ V'\ and such that for all v\ G V" 
there are at most 2e|V'| vertices V2 G V with |r(ux) H r(u2)| < £Q 2 N. Next, we can find a 
subset U' C U with |J7'| > a 2 iV such that each u £ U' has at least 10e|V'| neighbors in V"" 
(here we need to choose e sufficiently small, but still polynomial in a). This part uses the 
fact that the minimum degree is large and so there is a quadratic number of edges leaving 
V" . Now, fix u G U' and v G V". We will build many paths of length three between u and v 
as follows: Start with u and move to a neighbor of u in V". There are at least 10e| V | options 
for this step and at most 2e|V'| of them will have less than < ea 2 N common neighbors with 
v (this is how we defined V"). This means that we can complete this path in at least > ea 2 N 
ways. This gives > a c N 2 distinct paths of length three. 

3.3.1 Energy version of growth in ¥ p 

We will now use the BSG theorem to derive an energy version of Theorem 3.3.1. The proof 
will follow from the following result (due to Bourgain [Bou09]). 

Theorem 3.3.5. Let F = ¥. p with p prime. Let A C F and T C F*. Suppose that for all 
A G T we have E(A, XA) < K\A\. Then, there exist A' C A and T' C xT (for some x G F* ) 
such that \A'\ > \A\/K C , \T'\ > \T\/K C and with \A' + XA'\ < K C \A'\ for all A G T . 

Proof. Using the BSG theorem (Theorem 3.3.2) we can find sets X\,Y\ C A for each A G T 
such that |X A |, \Y X \ > \A\/K C and such that \X X + XY X \ < K C \A\ for all A G T. We will want 
to somehow 'paste' many of these sets together. We start with a simple claim. 

Claim 3.3.6. Let Si, . . . , Sj~ C 5 be finite sets with \Si\ > 5\S\ for all i G [k]. Then, there 
exists i G [k] such that 



|{j€[A;]||5 i n5 i |>(<5 2 /2)|S'|}| 



> {S 2 /2)k. 



Proof. Observe that 




£|{*|*G^}| 2 



X 




1^1 



> k 2 5 2 \S\. 



If we take i G [k] such that 



J2\SinSj\ > kd 2 \s\ 



.1 
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we will get the required property. 



□ 



Using the claim we can find some Xq £ T and a subset T' C T with \T'\ > \T\/K C 
(remember our convention to 'reuse' the constant c) such that for all A G T' we have |^a ^ 
X\|, \Y\ n > |A|/K C . Notice that, to get this to work, we need to apply the claim with 
S = A x A and the family of sets S\ = X\ x Y\. We find Ao such that S\ has intersection at 
least \A\ 2 /K c with S\ for all A in some large set T' and then argue about the intersections 
of the projections X\ , Y\ . 

In what follows we will use Ruzsa Calculus (RC) very freely (without stating each time 
exactly what part we are using) and the reader is advised to recall the different claims 
involved. We will use the notation X = Y to mean \X + Y\ < J^ c |^4| for some absolute 
constant c (this notation is only for this proof). We know that X\ = XY\ for all A G T. Thus 
X\ = X\ for all A and in particular X\ = X\ . Using RC and the fact that |X\ o nX\| is large 
for all A G X" we get that X\ Q = XY\ for all A G T' . In the same way, since Y\ n Y\ is large, 
we get that X\ = XY\ for all A G T'. Using the triangle inequality, we get Ao^a = XY\ 
for all A G T 1 which is the same as Y\ = j^Y\ . Set T" = {1/\q)T' and A' = Y\ and the 
theorem follows. □ 

Combining Theorem 3.3.5 with Theorem 3.3.1 we immediately get to our previously 
described goal: 

Corollary 3.3.7 (Growth in energy). Let A,T C F with p a < \A\ < p x ~ a and \T\ > p 13 . 

Then there exists A G T such that E(A,XA) > \A\ 1+C ( a,l3 \ where c(q,/3) > is a constant 
depending only on a and j3. 



3.4 Szemeredi-Trotter in finite fields 

In the previous section we proved the entropy version of the growth theorem in F = ¥ p . 

Corollary 3.4.1 (Growth in entropy). Let i,TcF with p a < \A\ < p l ~ a and \T\ > p 13 . 

Then there exists A G T such that E(A,\A) > |^4| 1+c ( a '^) ; where c(a,(3) > is a constant 
depending only on a and f3. 

We will now see how to use this theorem to give a non trivial bound of N 3j/2 ~ e for some 
constant e > on the number of incidences of N points and ./V lines in F 2 . We wish to prove: 

Theorem 3.4.2 (ST over finite fields [BKT04]). Let L be a set of N lines in F 2 and let P 

be a set of N points in F 2 . Then, if p a < N < p 2 ~ a for some a > then \I(P, L)\ < iV 3 / 2 ~ e , 
where e > depends only on a. 

The proof will be in two steps: 
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1. Reduce the problem to the case when the N points are contained in an 

ATl/2 by N 1 / 2 

grid A x B C F 2 . 

2. Prove the required bound over a grid using Corollary 3.4.1. 

We will use the following notations: for a point p G P denote by L(p) = {£ G L \ p G £} and 
for a line £ G L denote P(£) = {p G P \ p G €}. Suppose P, L are such that \I(P, L)\ » iV 3 / 2 " 6 
where e will be chosen small enough to derive a contradiction later. We start with throwing 
away some lines/points to ensure certain regularity conditions. First, remove all lines with 
at most N 1 / 2 ~ 2e points on them. This can reduce the number of incidences by a negligible 
fraction. Since there are at still > N^l 2 ~ e incidences we must have at least A rl_e lines left 
after this step (otherwise use Cauchy-Schwartz to bound the number of incidences). Next, 
remove all lines that have at least N l / 2+2t points on them. Recall that by Cauchy-Schwartz 
we have a bound of iV 3 / 2 on the number of incidences and so in this second step we will 
remove at most N l ~ 2<L lines, which means that we will still have at least > N 1 ~ e lines with 
at least N l l 2 ~ 2e points on each. Thus, the total number of incidences will remain at least 
jy3/2-3e_ ^ ^ s point we have that for each line £ G L, 

N l/2-2e < |p^| < N l/2+2e_ 

We can perform the same procedure on points and obtain that for all p G P, N 1 / 2 ~ 2e < 
l-^(p)l ^ N 1 / 2+2t . Since we are only removing points we will still have after this step the 
bound |P(^)| < _/V 1 / 2+2e for each line (the lower bound might not hold). 

3.4.1 Translating the problem to a grid 

To translate our problem to a grid we first find two points Po 5 f?i £ P such that most incidences 
happen on intersections of a line through po an d a line through p\. 

Claim 3.4.3. There exist points po and p\ in P such that there exist a subset P' C P with 
\P'\ > N 1 "^ and s.t P' C {£ n l\ \ £ G L(p ),£i G L{p x )}. Here c > is some absolute 
constant. 

Proof. For p G P define the set of points that lie on some line through p to be 

r 2 (p) = {jl G P I 3£ G L s.t p,p' G £}, 
(there are vertices of distance two from p on the incidence graph) . We are looking for a pair 
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Po,Pi with large ^(po) H T2(p±)\- To find them we will consider the expected size of 



e po,pi [|r 2 (po) nr 2 (pi)|] = X) X) XI W^-V 

Po,pi6PgeP£ ,£ieL(g) 

l^o)|-|P 



6£(g) 



> 




> _L (^3/2-2^ 4 > ]v l-ce ) 

where, in the cain of inequalities, we used Cauchy-Schwartz twice. Thus, we can pick po,Pi 
so that the expectation is achieved and set P' = ^(po) H ^(pi) to be the required set. □ 

Notice that, by our assumption on L(p) we have that \I(P',L)\ >> N 3 / 2 ~ 2e and so we 
have not lost anything by replacing P with P 1 . If we draw a picture of the lines through 
Po and the lines through p\ we get a skewed grid that contains the large set P' . Our next 
goal is to 'straighten-out' this grid so that the lines through p$ are parallel to the X axis 
and the lines through p\ are parallel to the Y axis. This will be obtained using a projective 
transformation sending po and p\ to the line at infinity. 



3.4.2 Projective space over F 

Since we are working over a finite field it makes sense to stop for a minute to define the 
basic properties of projective space. It will help to keep in mind the mental picture of real 
projective space in which we place the real plane on a slice z = 1 in three dimensional space 
and then project points to the half sphere by passing a line through the origin. 

More accurately, the construction of d-dimensional projective space over F is as follows. 
Take the space ¥ d+1 \ {0} and identify two non-zero vectors x, y G F d+1 if there exists a non 
zero A £ F such that x = Xy. Call the resulting space FF d or the d-dimensional projective 
space. Points in PF d are given using d + 1 homogenous coordinates x = (xq : . . . : xj) with 
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each point having exactly p — 1 different homogenous coordinates. The 'regular' or 'affine' 
(/-dimensional space ¥ d can be embedded into PF rf by sending x = (x\, . . . ,Xd) £ F rf to 
x' = (1 : xi : X2 '■ - - - '■ Xd) £ PF d . Notice that, since the first coordinate is fixed to one, two 
different vectors map to two different points (the zero vector goes to (1 : : . . . : 0) which 
is non-zero!). Using this embedding, we call the points with homogenous coordinates having 
xq = points at infinity. The set of all such points is called the hyperplane at infinity and is 
another projective space of dimension smaller by one. For example, the points at infinity in 
PF 2 form a projective line PF 1 called the line at infinity. 

To get a feeling for these concepts consider the following example. Let £ be a line in 
F 2 . Suppose £ is given by the equation aX + bY + c = with a,b,c E F. Now embed F 2 
in PF 2 using three homogenous coordinates (X : Y : Z) so that the points at infinity are 
those with Z = 0. A point (X, Y) on £ will map to (X : Y : 1) and will satisfy the equation 
aX+bY+cZ = 0. Notice that homogenous equations do not care about choice of homogenous 
coordinates and so it makes sense to look at their common solutions in projective space. Thus, 
we can identify the line £ with the line £' in projective space given by the homogenous equation 
aX + bY + cZ = 0. The affine points (points that are not at infinity) on £' are precisely those 
that come from points in £. There is however a new point, at infinity, given by (—b : a : 0) 
(or any of its non-zero multiples). At least one of a, b are non zero and so this makes sense. 
Notice that the coordinates of this point correspond to the direction of £. This means that, 
if we take another line £2 in the same direction of £ and embed it into PF 2 it will also contain 
the same point at infinity! Thus, lines in the same direction intersect at a fixed point at 
infinity corresponding to their direction. 

One last thing we need to consider are linear mappings over PF 2 . These can be given 
by any 3x3 matrix and act on the points of PF 2 in the obvious way. Notice that such a 
mapping may take points not at infinity to the line at infinity and vice versa. Also notice 
that such mappings map lines to lines. 

Let us go back to our set of points P' and the lines L. We can embed these into PF 2 and 
then perform a liner transformation taking po to the point (1 : : 0) at infinity and p\ to 
(0 : 1 : 0) at infinity. By our previous discussion one can check that, considering the 'affine' 
points (those with Z = 1) after the transformation, the lines through po are now parallel to 
the X axis and the lines through p\ are parallel to the Y-axis. There might be some points 
in P' that were moved to infinity but all of those must lie on a single line passing through po 
and pi and so there are at most A^ 1 / 2+2<E of those and we can safely ignore them. This means 
that, after the projective transformation, most of the set P' is in the 'affine' part [z = 1) and 
so we can go back to F 2 (discarding the z = 1 coordinate) and we now have that the set P' 
is contained in a grid Ax B with \ A\, \B\ < N l / 2+2e 

3.4.3 Counting incidences on the grid 

Renaming iV to be N l+e ' for some e' > and using the projective transformation above we 
see that Theorem 3.4.2 will follow from the following claim: 
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Claim 3.4.4 (ST over a grid). Let P,L be sets of at most N points/lines and suppose 
P C A x B with \A\,\B\ < N 1 / 2 . If p a < N < p 2 ~ a for sufficiently small a > 0, then 
\I(P,L)\ < N 3 / 2 ~ e . 



Proof. Our goal will be to reduce to Corollary 3.4.1. Our grid is given by 'rows' b G B and 
'columns' a £ A. For each b G B let R(b) = P n (A x {&}) denote the set of points in P that 
have Y coordinate equal to b. Denote also by H (b) = {I G L \ I n 7^ 0} the set of lines 
that pass through some point in R(b). We can ignore the few lines that are parallel to either 
the X axis or the Y axis. 

The first step is to find two rows 60 an d b\ such that many lines pass through both R(bo) 
and R(b\). This will be obtained, again, using a probabilistic argument. Notice that in the 
inequalities below we use the fact that each line can intersect R(b) in at most one point for 
each b G B. 

E boM [\H(bo)nH(h)\) = 1 Yl E E E w-w 

&0,&l€-B l&L p£R(b ) qeR{b!) 

2 



E E ■ 

£eL P ,qeP 




> 



Therefore, we can find two elements in B, w.l.o.g take these two be b = and 6=1 
such that \H(0) n F(l)| > N 1 ^ 06 for some constant c> 0. Let L' = iT(0) n H(l) be the set 
of lines that contain both a point with 6 = and a point with b = 1 in P. As before, we 
could have removed all lines with less than N l / 2 ~ 2t points on them and so we can assume 
\I(P,L')\ > iV 3 / 2 "". 

Since at most 0(N) incidences can occur on the lines b = or b = 1 we have that 
\{(pj) £PxL'\p££ and p # R(0) U R(l)}\ > N 3/2 ~ ce . 

Consider a point p = (a,b) with b {0, 1} that lies on a line £ G L' . This line passes 
through two points, say (xq,0) and (sci,l) with xq,xi G A and so we have (a, b) = (1 — 
6)(xo,0) + b(x±, 1) which means that (1 — b)x$ + bx\ G A. This gives 

\{(b,xo, Xl ) e B x Ax A\(l- b)x + bx x G A}\ > N^ 2 ~ ce . 

So, there exists a subset B' C B with |B| > N x / 2 ~ 2ce > iV 1 / 4 such that for all b G £?' we have 

|{(x , Kl ) G A x A I (1 - b)x + 6x1 G A}| > N l ~ 2ce . 
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This implies that E [A, j=$Aj < N 1 / 2 * ^ = |^|i+0(e) for all b e B > which con tradicts 
Corollary 3.4.1 if we take e small enough. Notice here that we need use the bound p a < 
N < p 2 ~ a , with a taken to be e/C for some large constant C, to satisfy the conditions of 
Corollary 3.4.1. □ 



3.5 Multi-source extractors 



An extractor (short for randomness extractor) is an algorithm that transforms 'weak' sources 
of randomness, into strong random bits. For example, suppose X is a random variable 
distributed uniformly over a set 5 C {0, l} n of size IS"! = 2 k . Informally, X contains k 
bits of randomness and so we would hope to use X to generate k (or close to k) unbiased 
random bits. We do not know, however, the set S and so have to construct a single function 
/ : {0, l} n i — y {0, l} k such that f(X) will be uniform for all such X. It is not hard to see 
that such a function does not exist (even if we require the output to be only one random 
bit). There are two different ways around this obstacle and both give rise to interesting 
questions. One is to allow / to use a small number of auxiliary random bits (independent of 
X). Such a function / is called a seeded-extractor. We will talk more about these later when 
we discuss applications of the finite field Kakeya problem. Another approach is to assume 
some structure on the source X (say, that the set S C {0, l} n belongs to some 'nice' family 
of sets). This restriction allows, in many interesting cases, for a single deterministic extractor 



3.5.1 Extractors for constant number of sources: BIW 

One well-studied class of deterministic extractors are for sources belonging to the class of 
several independent blocks. In this family, the random source is partitioned into blocks 
X = (Xi, . . . , Xt) £ ({0, l} n )* such that the different blocks are independent (as random 
variables) and each contains some minimal amount of entropy. The right notion of entropy 
(and the most commonly used) is min-entropy. The minentropy of X, denoted Hqo (X) is 
defined as the maximal k such that ¥[X = x] < 2~ k for all x in the support of X. If X is, 
as above, uniform on a set of size 2 k (these are called flat sources) than it has min-entropy 
k. Conversely, one can show that every source with min entropy k is a convex combination 
of flat sources of min entropy k [CG88]. Thus, it is enough to argue about flat sources. A 
deterministic (k, e)-extractor for i-sources is a function 

/:({0,in*->{0,l} m 

such that for every t independent random variables X\, . . . , Xt £ {0, l} n , each of minentropy 
at least k, the output f(X±, . . . ,X t ) is e-close to the uniform distribution in statistical dis- 
tance 2 . Clearly m is at most t ■ k and the goal is in general to output as many bits as possible 

2 The statistical distance between two distributions is the i\ distance of their probability vectors. 
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with error (e) as small as possible. In what follows, we will mostly talk about extractors with 
one bit of output since this is usually the hard case (once you have one bit you can usually 
get more). 

A simple probabilistic argument shows that there are deterministic extractors (even for 
two sources) that works for minentropy k ~ log(n). However, explicit constructions are 
pretty hard to find. Today, the best constructions all use in some way or another tools from 
incidences in finite fields (or, equivalently the sum product theorem). We will sketch the 
first construction to make use of these tools. This is a result of Barak, Impagliazzo and 
Wigderson [BIW06] and was the first explicit extractor for a constant number of sources that 
worked for any linear minentropy k = J7(n). The idea is as follow: consider three independent 
sources X, Y, Z 6 {0, l} n all with minentropy 5 ■ n. Suppose n is prime, and identify the three 
variables with elements in the finite field F = GF(2 n ) that does not contain subfields (there 
are some subtelties to discuss if n is not prime but we will not go there) . Let W = X + Y Z 
be computed over F. Using the Szemeredi- Trotter theorem we can show that W is close to 
having min entropy at least (5 + e)n for some small positive e. We will prove this below with 
entropy replaced by set-size. Once we know this, we can iterate this construction and take 
the functions: 



etc.. and prove by induction that after a constant number of steps (depending on 5, e) we will 
get a distribution that is close to uniform (the last step that goes from high minentropy to 
close-to-uniform requires a slightly different argument). Let us now prove a set-size variant 
of the lemma at the heart of this argument (the generalization to minentropy is left as an 
exercise). The proof will work over any field F in which the Szemeredi- Trotter type bound 
I(P, L) < N 3 / 2 ~ e holds. As was mentioned before, even though we proved this bound over 
finite prime fields, the proof holds over any field that does not contain large subfields (which 
is relevant w.r.t the construction described above). 



Lemma 3.5.1. Let A,B,C C F be subsets of size \¥\ a < N < IF] 1 "" of a field F in which 
the Szemeredi- Trotter bound holds. Then \X + YZ\ > N 1+e , with e > depending only on a. 



denote the 'weight' of x in the distribution A + BC (i.e., when we sample three independent 
samples from A, B, C and compute a + bc). We have 



fi(Xi,X2, Xs) — Xi + 
f 2 (X 1 , ...,X 9 ) = (X 1 + X 2 X 3 ) + (X A + X 5 X e ) • (X 7 + X 8 X 9 ) 



Proof. Let 



S(x) = \{(a, b,c) e A x B x C \ a + be = x}\ 




(3.1) 



X 
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On the other hand, if we assume in contradiction that \A + BC\ < N 1+e , we get 

Now, if we define T = {x \ S(x) > N 2 ~ 2e } and using the two inequalities above we get 
This implies 

|{(o, b,c,x) e Ax B xC xT\a + bc = x}\> N 3 ~ 4e . (3.3) 

This can be viewed as a bound on line/point incidences by defining a set of points P = C xT 
and a set of lines L = {£ a ,b} with £ a ^ defined by the equation a + Xb = Y for all a € A, b G B. 
The number of lines/points is at most N 2+2e and the number of incidences is N 3 ~ 4e . If e is 
sufficiently small this will contradict Szemeredi- Trotter. □ 



3.5.2 Bourgain's two source extractor 

When the number of sources is two (the smallest possible) much less is known. Suppose 
we want a two-source extractor for minentropy k that outputs a single bit (with some fixed 
small e). A probabilistic argument shows that this can be done with k ~ log(n). A simple 
explicit construction exists when k > n/2 (take the inner product modulo two) [CG88]. For 
a long time this was the best known explicit construction. This was changed a few years 
back when Bourgain [Bou05] showed how to use the ST theorem to construct an extractor 
for two sources of min entropy k = (1/2 — e)n for some positive e. It is an open problem 
to give an explicit construction of a two-source extractor for minentropy significantly less 
than n/2. There are construction of weaker objects called dispersers for two sources that 
only output a bit that is non constant (i.e., a bit that is equal to both zero and one with 
some positive probability). These constructions work for minentropy as low as k = 
[BBW06, BKS + 05]. These constructions use a whole lot of tools, among which are those that 
we have developed here. We will now show Bourgain's construction with one bit output (it 
is possible to extract more bits). 

We start with the analysis of the inner product extractor, which works for minentropy 
larger than n/2. Recall that it is enough to consider two 'flat' sources A, B C {0, l} n . To 
bound the distance of (A, B) from the uniform distribution on one bit it is enough to bound 
the following quantity which we will refer to as the bias 



bias(A,B) 



\A\\B\ Z i Z ^ V ' 

111 1 aeAbeB 
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To bound the bias we use some Cauchy- Schwartz calculations: 



v ' 1 ~ I A \\B\ ^ 



aeA 



beB 



< 



< 




1/2 



£(-!)<»■»> 

beB 



E(- 1 ) <a ' 1 

beB 



E E (-i) <a,6 - 6,> 

ae{0,l} n 6,6' 6-B 
1/2 



1/2 



1/2 



1 1 VfceB / 



A|V2 
2" 



ANSI 



The bias is equal to the difference between the probability that (A, B) is equal to one and the 
probability that it is equal to zero. Thus, if |A||S| > C ■ 2 n then the distance of (A, B) from 
the uniform distribution will be roughly \j\fC. This shows that the inner product function 
is a (k, e) extractor for k » § + log(l/e). 

But how can the above calculation be useful if we wish to handle smaller entropy? Clearly, 
the inner product function is not enough since we can take A and B to be orthogonal subspaces 
of dimension n/2 each. Can we fix our construction to avoid such bad examples? 

The first step is to observe that, in the calculation above, one can replace the set size 
of A, B with a more refined quantity. For a distribution [i on some finite set f2 (i.e., \x is a 
function from f2 to M>o with sum of values equal one) we will denote the £2-energy of ji by 

£(m) = (em*/) • 



Notice that if fi is a uniform distribution on some subset A then E(fi) = \A\. Notice also 
that our old notations for additive energy E(A,B) = r^n^m] satisfies E(A,B) = E([ia+b), 
where ha+b is the distribution obtained by sampling two independent variables a G A and 
b S B at uniform and then outputting a + b. Another interpretation of E(fi) is as the inverse 
of the 'collision probability' cp(fi) = /i(x) 2 which is the probability of two independent 
copies of fi being equal to each other. We can similarly define bias for distributions as 



bias(/ii,/i 2 ) 



E 



1) 



5G 



It is straightforward to verify that the calculation above also works if we replace set size 
with energy. That is: 

2 n 



bias(/ii, H2) < 



1/2 



£(Mi)£(M2) 

What is more surprising is that the bound is not changed by much if we replace the two 
distributions fi\ , ^2 with the distributions of sums of several independent copies drawn from 
the same distributions. To see this observe that 



bias(/xi,/i 2 ) 



E 



a;i~/ii,X2~A 1 2 



E 



2~A*2 



I \1 2 



— E 
= bias(>i,/V+>2), 

where the ad hoc notation ^+^2 means summing two independent copies drawn from (12 
(this is actually the convolution /12 * ^2)- Iterating this calculation four times we can obtain, 
for example, the following claim 

Claim 3.5.2. Let A, B C {0, l} n . Then 

bias(A, B) < bias(4 ■ A, 4 • B) 1 / 16 , 

where 4 • A denotes the distribution of sums of four independent uniform, variables from A 
(similarly for B). 



Bourgain's approach to constructing a two-source extractor for minentorpy rate 1/2 — e 
is as follows: Construct a set S C {0, l} n such that for all subsets A C S with \A\ > 
\S\ 1/2 ~ e we have £7(4 • A) » 2 n / 2 . Then define the extractor / : S x 5 ^ {0,1} as 
f(x,y) = (x,y). Formally, we will need to identify S with some {0, l} n but this will not be 
a problem. This will work since, if we take two (flat) sources A, B C S of size |5| 1//2_e (this 
corresponds to minentropy rate > 1/2 — e) then the bias of their inner product is bounded 
by (2 n /E(4: ■ A)E(A ■ B)) 1/32 which will be close to zero since 

E(4 ■ A)E(A ■ B) » 2 n 

(the power of 1/32 really doesn't change much). 

Due to some technical difficulties in working over fields of characteristic two, we will 
construct the set S over the group Z3 instead of over ZJ- To justify this 'switch' observe 
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that we can replace (—1) in the summations above with a complex root of unity of order 3, 
say oj = exp(2-7ri/3) and define 



E 



bias^i,^) 

where the inner product is over Z3. Then, the same calculation as above gives 

2« \ 1/2 



bias^/ii,^) < 
as well as 



E(m)E(n2 



bias^, B) < bias w (4 • A, 4 • B) 1/16 . 

This means that, if we can construct a set S such that every subset A of size ISI 1 / 2 -^ satisfies 
E{A ■ A) » ^ 1 / 2+t ) n we will get that, for all roots of unity of order 3 the bias bias^A 5) 
is close to zero for all sets A, B in S of size > IS"! 1 / 2-6 . It is not hard to show then that the 
distribution of (a, 6), with a G A, b G B is close to the uniform distribution on three elements 
(so the output of the extractor is not a bit but rather a uniform element in a set of size three) . 

The construction of S C Z3 is as follows. Suppose n = 2p, where p is a prime number 
(there are ways to handle other values of n but this is a technicality). Let F be a finite field 
of size 3 P so that F does not have large subfields (i.e., we can use the ST theorem in F 2 ). 
Identify Z3 with F 2 by writing each element of F in some basis of F over GF(3). So addition 
m F 2 is the same as coordinate wise addition modulo 3 in Z3 . We can now define: 

5={(x,x 2 )|xGF}cF 2 ~Z 3 l . 

We proceed with the analysis. Let A C S be of size \A\ > \S\ 1/2 - £ = 3 p(1/2 ~ e) . Then 
there is a subset A C F of the same size such that A = {(a, a 2 ) | a G A}. We need to show 
that E(A • A) = E(A • A) > 3 p(1+e) >> 3 n/2 . For this purpose, define for all x, y G F the set 

Rx, y = {(ai, a 2 , a 3 , a 4 ) G A A \ ^ m = x, ^ a 2 = y}. 

\A\* 



Notice that 



E(4 ■ A) 



Yl \ Rx,y\ 



Thus, if we could show that 



R = ^\R x , y \ 2 <\A\^ 



we would have that E(A ■ A) > |j4| 2+8e > 3P( 1+e ), for sufficiently small e, as required. 

We will bound the sum R by partitioning it into two parts. Let c > be a constant to 
be chosen later. Define Ti = {(x,y) | \R X J > |^4| 2 " ce } and T 2 = {(x,y) | \R x , y \ < \A\ 2 ~ ce }. 
Then R = R\ + R 2 , where R± is the sum of |-R x ,j/| 2 over (x,y) G T\ and R 2 is the sum over 
the 'large' terms in T 2 (the rest of the terms). R\ is easy to bound since the total number of 
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terms in T\ is at most |F| 2 < |^4| 2+8e and each term is at most \A\ 2 ce and so the total bound 
is 

Ri < |A| 2+8e |^| 4 - 2ce » \Af~ 8e 

if c is sufficiently large. 

To bound R2 we will bound the size of the set T2 by |^4| 2_8<E . If we can do that than we 
will be done since we can combine this bound with the trivial bound of \A\ 2 on each of the 
l-Rx.j/l's to obtain R2 < |^4| 6_8<E (to see the trivial bound of \A\ 2 notice that fixing a\, 02 allows 
us to solve for 03,04). Suppose in contradiction that | X2 1 > | ^4 1 2 8e . By definition, for each 
(x,y) G T2 there are at least | 2 ce solutions (01,02,03,04) G A 4 to the equations 

01 + 02 + 03 + 04 = x, 

2222 
a l + a 2 + a^ + a i = y. 

Let T3 = {(x, (x 2 — y)/2) \ (x, y) G T2] (this is where we need the characteristic to be different 
than two!) so that | T3 1 = \T 2 \ and such that for each (x,y) G T3 we have at least |^4| 2_ce 
solutions (01,02,03,04) G A to the equations 

01 + 02 + 03 + 04 = x, 

0102 + 0103 + 0104 + 0203 + 0204 + 0304 = y. 

We can now eliminate 04 so that for all (x,y) G T3 we have at least |A| 2_ce solutions 
(ai, 02, 03) G A 3 to the single equation 

y = oia 2 + a 2 a 3 + a 3 ai - (ai + a 2 + a 3 ) 2 + (a x + a 2 + 03) • x. 

Thus we have the bound 

\{(x,y, ai,o 2) a 3 ) £T 3 x A 3 \y = oia 2 + a 2 a 3 + a 3 ai - 
(01 + a 2 + a 3 ) 2 + (ax + a 2 + 03) • x}\ > \A\ A ~( 8+C >. 

We can now fix 03 to some value b G A so that 

\{(x, y, oi, a 2 ) G T 3 x A 2 \ y = aia 2 + a 2 b + 601 - 
(01 + o 2 + b) 2 + (01 + o 2 + b) ■ x}\ > \A\ 3 ~( 8+C ^. 

This last quantity can be viewed as the set of incidences of the lines ^01,02) ( a ii a 2) G A 2 
define as £ ai ,a 2 = {( u , v )\ v = a i fl 2 + + ba\ — {a\ + 02 + 6) 2 + (01 + 02 + b) ■ v } and the 
set of points T3. The number of lines is clearly at most \A\ 2 and the number of points \T%\ 
is at most |y4| 2+ce since we have Ylxt \Rx,y\ = l^l 4 an ^ so there can only be at most |^4| 2+ce 
summands larger than |A| 2_ce . Taking e to be small enough we will get a contradiction to the 
ST theorem since the number of points/lines is roughly \A\ 2 and the number of incidences 
approaches \A\ 3 . This completes the proof. 
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Chapter 4 

Kakeya sets 



4.1 Kakeya sets in W 1 

The Kakeya problem in M n deals with, the most efficient way to 'pack' many tubes (£- 
neighborhoods of line segments) that point in different directions. As we shall see, this 
question reduces to a discrete question about incidences of line segments pointing in suffi- 
ciently separated directions. The starting point is the definition of a Kakeya set. 

Definition 4.1.1 (Kakeya Set). A compact set K C W l is a Kakeya set if it contains a unit 
line segment in each direction. More formally, for every x G § n_1 there exists y = fix) G K 
such that {y + tx\t £ [0, 1]} C K. 

It is known [Bes28] that Kakeya sets can have measure zero (we will not prove this 
here). A more refined question has to do with the minimal dimension of a Kakeya set. For 
simplicity, we will use the Minkowski dimension (also known as covering/box dimension) but 
other notions (in particular Hausdorff dimension) are often studied in the literature. The 
Minkowski dimension (which we will refer to as simple 'dimension' from now on) is defined 
as follows. Let B t {K) denote the minimal number of balls of radius e needed to cover the 
(bounded) set K C W 1 . The dimension of K is defined as 

log B £ (K) 

dim( K ) = lim sup — — ; — j-r- . 

log(l/e) 

(technically, this is the upper Minkowski dimension). Roughly speaking, if dim(i^) < d 
then K can be covered by ~ (l/e) rf balls of radius e, where the ~ notation hides constants 
that might depend on the dimension n. It is a good exercise at this point to verify that this 
definition of dimension agrees with the usual definition of dimension for subspaces (intersected 
with a unit ball) and, more generally, algebraic surfaces. For example, the dimension of a 
line segment of length L is 1 since it can be covered by L(l/e) balls of radius e and the factor 
of L disappears in the limit. Also, a set of positive measure in W 1 must have dimension n. 
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The Kakeya conjecture (sometimes called the Euclidean Kakeya conjecture) states that a 
Kakeya sets K C W 1 ' must have dimension n, which is the highest possible. This conjecture 
is open for n > 3 (we will prove the n = 2 case below) and is related to several important 
questions in Analysis, PDE's and Number Theory. We refer the reader to the excellent survey 
[TaoOl] for more on these applications/connections. 

For this section only, we will think of n as constant and use our asymptotic notations 
~, >, < to suppress constant depending on n (these will disappear in the limit when e — > 0). 
Thus, one can replace the quantity B e (K) with a slightly more convenient quantity having 
to do with the number of grid-points close to K. More formally, let G e = eZ n denote the 
e-grid in W l . Notice that every point in W l is at distance at most \frie from some grid point. 
Let G e (K) denote the number of points in G e that are at distance at most 10-^/n • e (the 
constant 10 is arbitrary and is there just to give some wiggle room). Thus, in our notations, 
G e (K) ~ B e (K) and so we can use G e (K) from now on. We will sometimes abuse notations 
and treat G e (K) as the set of points of distance at most 10^/n • e from K. 

Before moving on the discretized setting, mentioned above, we will prove the n = 2 case 
of the Kakeya conjecture. 

Theorem 4.1.2 (Davies [Dav71]). Let K C R 2 be a Kakeya set. Then dim(K) = 2. 

Proof. Let K' be the e- neighborhood of K (i.e., all points of distance at most e from K) 
and notice that G e (K) ~ G e (K'). We will show G e (K') > ^ • i^nyi) > which will prove the 
theorem. 

Consider ~ 1/e tubes of width e with one endpoint at the origin and with the other 
endpoints spread along the first quadrant part of the unit circle in M 2 . That is, take £j 
to be the line segment connecting the origin with (cos(ej7r/2), sin(ej7r/2) and take Tj to be 
its e-neighborhood. Since if is a Kakeya set, we can 'shift' each of the tubes Tj (without 
changing its direction) so that they are contained in K' . Suppose we have already done that 
and that for all j, Tj C K. 

Notice that, for each j we have G e (Tj) > 1/e and that, for i ^ j we have G e (TiDTj) < ., 
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(this is why we took tubes only in the first quadrant). Using Cauchy-Schwartz we get 



c 



< 




e" e trf z - j 



< 



(G e (iT')) 1/2 ^ l0g(1/e) ^ /2 



e 2 



Rearranging we get the bound 



G e (K>) > 



e 2 log(l/e) 

which gives dim(i^) = 2, when e goes to zero in the definition of dimension. □ 
4.1.1 The n/2 bound 

We will now see a proof that gives a lower bound of n/2 on the dimension of Kakeya sets 
in R n . This will also set up some of the notations for the next part which will use additive 
combinatorics to get a better bound of the form (4/7)n. 

Similarly to the set of tubes Tj used above we will now need an e-separated set of directions 
Q C § n • Since we are ignoring constants depending on n we can easily find such a set with 
|fi| ~ (l/e)™" 1 . Thus, if K is a Kakeya set we have that for all w G O there exists a w G W l 
such that the segment i w = {a w + tw \ t G [0, 1]} C K. Let us denote by b w = a w + w the 
second 'endpoint' of the line segment in direction w. For each w G O let a' w ,b' w be the grid 
points (in G e ) closest to a w ,b w . Consider the line segment t' w connecting a' w to b' w . Since 
£' w is obtained from l w by moving its endpoints by at most < t we have that the set of 
directions of line segments £' w has size at least > > (l/e) n_1 . Let A = {a' w \ w' G Q} 
and B = {6^ | w' G 0}. Then both |B| are at most G e (K) (since a w ,b w are in K and 
a^, 6^ are the closest gridpoints to them). On the other hand, we have \B — A\ > (since 
the differences between a' w and b' w cover all directions in 0,'). Since \B — A\ < we have 

\A\\B\ > (l/e)"" 1 which implies G t (K) > (l/e)^" 1 )/ 2 . This means that dim(K) > (n- l)/2. 

To go from (n — 1)/2 we use a tensoring argument: Observe that, if K is a Kakeya set in W 1 
then, for all i G N, K l C M. nt is also a Kakeya set. It is also simple to verify that dim(i^*) = 
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tdim(K) and so, using our previous bound on K l we get that tdim(il') > (nt— l)/2 dividing 
by t and taking t to infinity we get that dim(K) > n/2. 

In [Wol99], Wolff proved an even stronger bound of (n + 2)/2 for general n. We will 
not see this proof here and focus on later developments, starting with the work of Bourgain 
[Bou99], giving (increasingly higher) bounds of the form an for a > 1/2. These results use 
ideas and tools from additive combinatorics. 

4.1.2 Additive Combinatorics methods 

The proof of the n/2 bound we saw above uses only the 'endpoints' of the line segments 
(after moving them slightly so that they are on a grid). Not using other information cannot 
go beyond n/2 as there are sets of points \ A\, \B\ on the grid with \B — A\ ~ |^4||-B| and this 
is all we used in the proof. To go beyond this barrier we will also use, as a starting point, 
the mid points of the segments. That is, the points (a w + b w )/2. We saw that we can shift 
o-w,b w by at most < e so that they are on the grid G e . It is easy to see that a similar shifting 
argument can also put all three points a' w ,b' w and the mid point c' w = (a' w + b' w )/2 on the 
grid G e . The distance we need to shift the endpoints will grow by at most a constant factor 
which we do not care about. Define as before the sets A, B to contain the points a' w ,b' w 
with w' G fi' (the new set of directions we obtain after the shifting). As before we have 
l^'l ~ (V e ) n ■ We can also place all points in A, B in an 0(e)-neighborhood of G e (K) and 
so we have \A\,\B\ < G e (K). Let us denote N = G £ (K) 

We still know that \A — B\ > > (l/e) n_1 is large. Now we can also incorporate the 
midpoints to claim that, in some sense, the sumset A + B is small! To see this consider 
all sums of the form (a' w + b' w )/2 with w' 6 0! . These sums will all fall in a set of size 
< N = G e (K) and so, if we assume the dimension of K is at most d, this set will be of size 
at most < (l/e) d . Thus, the intuition is that, since the difference set is large, the sumset 
cannot be too small and so we will get a contradiction if the dimension of K is smaller than 
some bound. There is, however, a serious difficulty. The sumset {a' w + b' w \ w' £ Q,'} whose 
size we want to argue about (we can discard the 1/2) is not really the sumset A + B but 
rather a sub-sumset determined by some fixed family of pairs (indexed by Q'). To see the 
way around it we remind ourselves of the Balog-Szemeredi-Gowers theorem which says that, 
if a dense family of pairs in a sumset A + B is small, then there are large subsets A' , B' with 
small sumsets. Thus, if the family of pairs (a' w ,b' w ) with w £ fi' is dense, in the sense that 
l^'l > (1^1 l-^l) 1_<5 (f° r some small constant 5) then we can hope to save the situation in some 
way. This is indeed the case if we are shooting for a (1/2 + 5')n type of bound on dim(i^). To 
see this, notice that \A\, \B\ < N and, if we assume in contradiction that iV << (l/e)( 1 / 2+<5 )™ 
we get that \Q'\ ~ (l/e)™" 1 ~ N 2 ~ s as required. 

In [Bou99] Bourgain carries out the above plan with the aid of a modified version of 
the BSG theorem tailored for this situation. Bourgain's proof was simplified considerably 
by Katz and Tao [KT99, KT02] who also improved the constant a from Bourgain's original 
13/25 to almost 0.596... We will see below the simplified proof which gives 4/7. Let us now 
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state the general reduction from the Kakeya dimension question to a simple to state question 
in additive combinatorics. This more general reduction will allow us to use more points (not 
just the midpoints) which will be useful in simplifying the proofs. 

Definition 4.1.3 (SD(i2, /?)). Let A, B C H be finite subsets of an abelian group H with 
\A\, \B\ < N. Let r C A x B. Let R C N and suppose that for all r G R we have \{a + 
rb \ (a,b) G T}| < N. We say that the statement SD(i2, /?) holds over H if for every pair of 
sets A, B as above, we have \{a — b \ (a,b) G T}| < N@. 

Lemma 4.1.4. Suppose S~D(R, (3) holds over R n for R = {1, 2, . . . , r} and j3 > 1. Then for 
all Kakeya sets K C W 1 we have dim(i^) > n//3. 

Proof. Let K C W 1 be a Kakeya set. We will treat r as a constant (as e will go to zero). 
Consider, as before an e-seperated set of directions Q. C § n ~ 1 of size ~ (l/e) n_1 and let 
a w i b w G K be the endpoints of a unit line segment in direction w G O that is contained in 
K. Fix e > to be sufficiently small and let N = G e {K). We can move each pair (a w ,b w ) 
by at most 0(e) to new points (a' w ,b' w ) on the grid G t so that all combinations a' w + jb' w 
for all j G R fall in a set of size ~ G 6 (K) (similarly to what we did for sums). Since the 
line segments £ w were moved by 0(e) we have that the new set of directions w' = a' w — b' w , 
denoted Q' , is also of size at least > (l/e)™^ 1 . Therefore, for all j G R we have 

\{a' w +jb' w \w'en'}\<N. 

Using the SD(R,P) assumption we get that 

(W 1 - 1 <\W w -b' w | w'en'}\<N? 

which gives the required bound (after a tensoring argument). □ 

Thus, in order to prove the Kakeya conjecture it suffices to show that SD(i?, 1) holds for 
some fixed set R C N. 

4.1.3 The An/7 bound 

We will now prove that SD({1, 2}, 7/4) holds over any abelian group H which will give a 
4n/7 bound on dim(ET) (we will assume that the order of 1 in H is larger than 2). Since we 
want to bound the size of the set {a — b \ (a, b) G T} we may assume w.l.o.g that the difference 
a — b is distinct for each (a, b) G T (since removing edges will only decrease the bound N on 
the other sets in the definition). We are thus interested in bounding the number of edges in 

r or |r|. 

The main ingredient in the proof is the notion of a 'gadget' which we now define. A 
'gadget' will be a substructure in the graph T with certain restriction on linear combinations 
on edges. More formally, a gadget G is defined as a 4-tuple G = (Va, Vb,E, C) with 
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• Va = (fli, • • • , a s ), Vb = (&i, • • • , b r ) two sets of formal variabels. 

• E a subset of Va x Vb (we call these 'edges'). 

• C a set of constraints of the form ai + rbj = a^i +r'bj/ with i,j E [s], i! ,j' G [r] and r, r' 
integers. 

An example of a simple gadget is G\ = (Va, Vb,E, C) with: 

Va = {ai,a 2 }, V B = {bi, b 2 }, 
E = {(ax,h), («2, 62)}, C = {ax + 2&i = a 2 + 26 2 }. 

We say that a gadget G appears in the graph r = i x B (with A, B subsets of the abelian 
group H) if we can map Va, Vb to subsets of A, B such that the set of edges E is contained 
in the set of edges induced by T and the constraints in C are satisfied. For example, if we 
take A = B = {1, 2, 3, 4, 5} and T = A x B then the gadget G± above appear in T by taking 
ai = 1, 02 = 3, 61 = 5, 6 2 = 4 (since 1 + 2- 5 = 3 + 2-4). 

We can also count the number of times a gadget appears in V in the obvious way as the 
number of different ways to map Va,Vb into subsets of A, B so that the edges/constraints 
are satisfied. For example, we will show that, if we take T G A x B such that for all edges 
(a, b) £ r we have a + lb S H' C H then G\ will appear in T at least |r| 2 /|/f'| times. This 
fact follows from a Cauchy- Schwartz calculation that is given by the following lemma. 

Lemma 4.1.5. Let W be a finite set and let f :W 1— >■ Z be a mapping to some other finite 
set Z. Then 

\{(v,u)eW\f(v) = f(u)}\>\W\ 2 /\Z\. 
Proof. The size of the set is given by the sum 

Yl 1 /(«)=/(^) = zZ Yl 1 /(«)=^ • l f{v)=z 

u,vew u,vew zez 




W\ 2 



\z\ 

□ 

We can apply this lemma to the gadget G\ as follows. Take the function / : Y 1— » H' to 
be /(a, b) = a + 2b. We then get the promised bound |r| 2 /|i7'|. 
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Recall that, in the definition of SD(R, j5) we have a bound N on the sizes \ A\, \B\ as well 
as on the sizes of each of the sets {a + rb\ (a,b) 6 T}. Once we have a gadget G and we 
can give a lower bound on the number of times it appears in T the next step is to give a 
corresponding upper bound on the number of times G appears in V in terms of N. This will 
be done by showing that we can 'encode' each gadget using a few elements, each in a set of 
size at most N. For example, the gadget G\ can be encoded as (ai, 02, «i + 2&i) since, from 
this triple, we can recover both b\ and 62 (using the fact that a\ + 2b\ = a<i + 262)- Since all 
three elements in this triple are in a set of size at most iV we get that there can be at most iV 3 
appearances of G\ in T. Combining this with the lower bound obtained from Lemma 4.1.5 
we get |r| 2 /iV < iV 3 or |T| < iV 2 . This bound on T is not very interesting and we proved it 
just to give an idea of the proof technique. To prove the claimed 4/7 bound we need to get 
a bound of |T| < TV 7 / 4 which will require a more elaborate gadget. 

A more elaborate gadget 

Consider the gadget G 4 / 7 given by 

Va = {01,02} 

v B = {bxMM 

E = {(ai, 61), (ai, 6 2 ), (a 2 , 62), (a 2 , b 3 )} 
C = {a 1 + 2b 1 = a 2 + 26 3 }. 

Let T C A x B be as in the definition of SD({l,2},/3) so that \A\, \B\ < N and \{a + 
rb\ (a,b) G r}| < N for r = 1,2. Our first step is to give a lower bound on the number of 
appearances of G4/7 in V. For this purpose consider the set 

M = {((a,b),(a',b')) er 2 |a = a'} 

(the set of paths of length two). Using Lemma 4.1.5 we have \M\ > |r| 2 /iV. Let / : M 1— > H 3 
be defined as 

/((a,6),(a',6')) = (6',a + 26). 

Notice that each collision of / gives an appearance of the gadget G4/7. Since the image of / 
is contained in a set of size N 2 , Lemma 4.1.5 gives at least 

\M\ 2 /N 2 > |r| 4 /iv 4 

collisions/appearances of G4/7. 

We now give an upper bound using the 'encoding' argument. Here we will use the fact, 
mentioned above, that w.l.o.g the differences on the edges of T are distinct. This will be useful 
since, knowing the different a — b on some edge identifies this edge and so also identifies its 
two endpoints. 
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Let G' = (cii, a 2 , b\, b 2 , 63) be an appearance of G4/7 (so that all edges/constraints are 
satisfied). We will show that G' can be recovered from the triple (63, a\ + b 2 , «i + b\). Since 
we have the bound \{a + 6 | (a, b) E T}| < N we know that there are at most iV 3 such triples 
which will give the same upper bound on the number of appearances of G4/7. We now 
describe the decoding. The first step is to decode 

o 2 - 61 = (ai + 61) - 26 3 

(using the constraint a\ + 2b\ = 02 + 263). Then we compute 

h - bi = (at + 61) - (ai + b 2 ) 
. Using these two we can compute 

a 2 -b 2 = (a 2 - h) + (61 - b 2 ). 

Now, using the distinctness of differences we can recover a 2 ,b 2 (since it is an edge in T) and 
from them the rest of the vertices in G' . Putting the two bounds together gives the required 

|r| < iv 7 / 4 . 

4.2 Kakeya sets in finite fields 

In his influential survey on the Kakeya problem, Wolff [Wol99] defined the finite field analog 
of the problem. Below, F will denote a finite field of size q (not necessarily prime). 

Definition 4.2.1. A Kakeya set K CF" is a set containing a line in every direction. More 
formally, for all x E F n there exists y E ¥ n such that {y + tx \ t E F} C K. 

Wolff asked whether a bound of the form \K\ > C n ■ q n holds for all Kakeya sets K, with 
C n a constant depending only on n. Here, one should think of n as fixed and the field size, 
q goes to infinity (thinking of q ~ 1/e helps). The proofs we saw in the previous section, 
using additive combinatorics, can be carried out also over finite fields. For example, using the 
SD({1, 2}, 7/4) statement (over the abelian group F n ) one gets a bound of \K\ > C n ■ g 4n/ " 7 '. 
In [Dvi08], the polynomial method was used to give an answer to Wolff's question. Initially, 
a proof of C n q n ~ l was shown and then, using an observation of Alon and Tao, the tight 
exponent C n q n was also obtained. We will see both the original proof and the improvement, 
which is another nice example of the usefulness of working in projective space. 

4.2.1 Proof of the finite field Kakeya conjecture 

To start, we define Nikodym sets which are closely related to Kakeya sets. 
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Definition 4.2.2. A Nikodym set K CF™ is a set for which, through every point not in K 
there is a line that intersects K in all points but one. More formally, if for all y G" K there 
exists x such that {y + tx \ t G F*} C K. 

This definition seems 'stronger' that the Kakeya definition. However, the two definitions 
are related by a factor of q. 

Claim 4.2.3. If there exists a Kakeya set K of size T in F n then there exists a Nikodym set 
M in F n of size at most qT. In fact, one can take M = {tx \ t G F, x G K}. 

Proof. For each x G F n , there is y G F n such that {y + tx \ t G F} C K. This means that 
{sy + stx | s,t G F} C M. Fixing t = 1/s and going over all s / we get {sy + x \ s G F*} C M 
and so M is a Nikodym set. □ 

What can we do with a small Nikodym set Ml Suppose we have a polynomial f(x±, . . . , x n ) 
of degree d and we know that values of / on all points of M . If the degree of / is less than 
q — 1 we could use these values to recover the values of / everywhere! To see this, suppose 
we wish to find the value of / at a point x g" M. Let y G F n be such that the punctured line 
£ = {x + ty 1 1 G F*} is contained in M. Restricting / to the line £ we get a polynomial of 
degree at most d and we know its values in q — 1 > d points. Therefore, we can recover the 
coefficients of the restricted polynomial and compute its value at the missing point x. But 
this means that the number of points in M must be larger than the number of coefficients in a 
degree q — 2 polynomial. Since, otherwise, we could find a non-zero polynomial of degree q — 2 
that vanishes everywhere in M and is not identically zero. This will be a contradiction since, 
using the above decoding procedure, we would get that the polynomial is zero everywhere. 

The last step of this argument requires proving that a multivariate polynomial that is not 
identically zero, has a non-zero value at some point in F n . This is known as the Schwartz- 
Zippel Lemma: 

Lemma 4.2.4. Let f G F[ x±, . . . ,x n ] be a non-zero polynomial of degree d. Then there are 
at most dq n ~ l points in F n where f vanishes. 

Proof. By induction on n. The n = 1 case is the fundamental theorem of algebra. For larger 
n, write /(xi, . . . ,x n ) = Sj=i5j( x 2) ■ ■ ■ ,x n )x{, such that w.l.o.g g r (x2, ■ ■ ■ ,x n ) is non zero 
of degree d — r. By induction, there are at most (d — r)q n ~ 2 zeros of g r . For each one of 
them, the restricted polynomial / (which is now a polynomial in the single variable x\) might 
vanish identically and so have q zeros. For the rest of the assignments (at most c/ n-1 ) to g r , 
f will remain a nonzero univariate polynomial of degree r and so can have at most r zeros. 
Combining, we get at most (d — r)q n ~ l + rq n ~ l = dq n ~ l zeros for /. □ 

We can now give the proof of the bound on Kakeya sets. 
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Theorem 4.2.5. For every Kakeya set K C ¥ n we have \K\ > (\/n\)q n 1 . 

Proof. Let M be a Nikodym set of size q\K\. If \K\ « (l/n!)^ 1 then |M| << (l/n!)g n 
and we can find a polynomial / of degree d < q — 2 that vanishes on M and is not identically 
zero. For every point x M consider the restriction of / to the line I passing through x 
that has q — 1 points in M. The restriction of / to this line is a degree d polynomial an so, 
since d < q — 1 we get that / must vanish everywhere, contradicting the Schwartz-Zippel 
lemma. □ 

Using a tensoring argument, as we saw for Kakeya sets over the reals, one can amplify 
this bound to C n:t q n ~ t for all e > 0. There is, however, a clever way to get rid of this e 
completely. This has to do with working over projective space. 

Recall that the n dimensional projective space PF n is defined formally as the set of n + 1 
dimensional non-zero vectors with two vectors identified if they are a constant multiple of 
each other. We embed the affine space ¥ n in PF n by adding a coordinate xq = 1 so that 
the points at infinity are given by the hyperplane xq = 0. Recall that a line in F n in 
direction y will hit the point at infinity with coordinates (0, y±, . . . , y n ) (since multiplying 
by a constant doesn't change the point the choice of y is also up to a constant). There is 
a way to extend the polynomial method to work over the projective space. In projective 
space, we only consider homogeneous polynomials, those in which every monomial has the 
same degree. The set of zeros of a homogeneous polynomial is well defined in PF n since 
f(ax±, . . . , ax n ) = a d f(x\, . . . , x n ) for all non zero a £ F. When we embed F n into PF n 
in the above described manner, we can accompany this with an embedding of F[x^, . . . , x^ 
into the set of homogeneous polynomials in variables xq, x±, . . . , x n . This is done by sending 
f(x\, . . . , x n ) of degree d into 

f h (x , x 1 ,...,x n ) = Xof(x 1 /x , x n /x ) 

or, in other words, multiplying each monomial of / of degree d—r with Xq so that the resulting 
polynomial is homogeneous of the same degree of /. Notice that, setting xq = 1 in /" we 
get / back (thus, f h is consistent with the embedding of points in F n into PF n ). Also notice 
that, setting xo = in f h we get back the homogeneous part of / of highest degree. This is 
the restriction of f h to the hyperplane at infinity. 

Suppose K C F n is a kakeya set and embed F n into PF n using xo = 1. Let K' be the 
embedding of K (which has the same size as K). Saying that K contains a line in every 
direction is the same as saying that, through each point at infinity (0, y±, . . . ,y n ) there is 
a line that has q points in K' . Suppose now we had a polynomial / of degree d < q — 1 
that vanished on K and consider f h as above. Using the restrictions to all these lines, we 
get that f h must vanish at all points at infinity. This means that the homogeneous part of 
highest degree of / (which is the same as f h (0, x±, . . . , x n )) vanishes identically. But this is a 
contradiction since we assumed that / is non-zero and so it must have a homogeneous part 
of highest degree which is non zero. 
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Using the above argument we get 



Theorem 4.2.6. For all kakeya sets K C F n we have \K\ > (l/n\)q n 

In the finite field setting we also might care about the constant in front of the q n (this 
doesn't appear in the real case since we are taking a limit). There is a better bound of \K\ > 
(l/2 n )q n on Kakeya sets proved in [DKSS09] which uses a more sophisticated polynomial 
argument with zeros of high multiplicities. 

4.2.2 A construction of small Kakeya sets 

We now turn to describing the smallest known Kakeya sets which are of size 

\K\<^ T + 0(q^), 

which is, asymptotically as q tends to infinity, to within a factor of 2 of the lower bound 
obtained in [DKSS09]. The construction for the case n = 2 was given by [MT04] and the 
generalization for larger n was observed by the author for odd characteristic and by [SS08] 
for even characteristic. We give here the construction for odd characteristic. 

We will only worry about lines in directions b = (b\, . . . ,b n ) with b n = 1. The rest of 
the lines can be added using an additional q n ~ l points, which is swallowed by the low order 
term. Our set is defined as follows: 

K= {(«?/4 + «i «S-l/ 4 + «n-l -M) I Vi,...,V n -!,t £¥}. 

Let b = (pi, . . . , 6 n _ i, 1) be some direction. Then K clearly contains the line in direction b 
through the point (6?/4, . . . , fr£_i/4, 0). We now turn to showing that \K\ < 2 n-i ■ Notice 
that the sum of the first coordinate of K and the square of the last one is equal to 

vf/4 + vi-t + t 2 = (vi/2 + t) 2 

and so is a square in F. Since F has odd characteristic it contains at most ~ q/2 squares. 
Let xi, . . . ,x n denote the coordinates of the set K. Fixing the last coordinate we get that 
the first coordinate xi can take at most « q/2 values. The same holds for x%, . . . , x n -i and 
so we get a bound of « on the size of K. 

4.3 Randomness Mergers from Kakeya sets. 

In CS, the interest in the finite field Kakeya problem originated in the work of Lu, Reingold, 
Vadhan and Wigderson [LRVW03]. Motivated by extractor constructions, the following 
question was raised: Suppose Xi, . . . , are random variables each distributed over F", 
where F is a finite field of order q. We do not assume that the X^s are independent and are 
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guaranteed only that one of them is uniformly distributed over F n . The question is, what 
can we say about the entropy of a random linear combination of X±, . . . , X^l To make things 
simpler, suppose we only have two variables X, Y G F n such that X is uniform on F n and Y 
could depend on X. Let Z = aX + bY, where a, b G F are both chosen uniformly at random 
and independently of X, Y and of each other. How 'random' is Z? 

The connection between this question and the finite field Kakeya problem is as follows: 
Suppose we had a small Kakeya set K C F™ and take M = {ax \ x G K, a G F} to be the 
corresponding Nikodym set (see previous section) that is of comparable size to \K\. We 
know that for each x G F n there exists y = y{x) G F n such that {y(x) + tx \ t G F} C K. 
This means that {stx + sy(x) \s,t G ¥} C M. Renaming st = a and s = b we get that 
{ax + by(x) \ a G F, b G F*} C M. What this means is that, given X, one could set Y = Y(X) 
such that all linear combinations aX + bY with b non zero hit the small set M. This means 
that the output will land in M with high probability (at least 1 — 1 /q) which would imply that 
Z = aX + bY has low entropy (e.g when using min-entropy) . Thus, to answer the question 
of [LRVW03] we must (in the least) solve the finite field Kakeya conjecture! This problem 
is even more challenging, since it involves entropy and randomness (which we still need to 
define properly). Luckily, the polynomial method is sufficiently robust to handle even this 
harder scenario. 

We start with some definitions. The statistical distance between two distributions P and 
Q on a finite domain 0, is defined as 



We say that P is e-close to Q if the statistical distance between P and Q is at most e. The 
min-entropy of a random variable X is defined as 



(all logarithms are taken to the base 2). Intuitively, having min entropy at least k means 
having at least k bits of entropy. We say that a random variable X is e-close to having min 
entropy k if there exists another random variable X' such that X' has min entropy > k and 
X is e-close to X' . 

Notice that a r.v X distributed over F n can have min entropy between zero and nlog(g). 
If X has min entropy /3nlog(g) we call (3 the min entropy rate of X. The following lemma is 
very useful and allow us to move from min entropy to set size: 

Lemma 4.3.1. Say X is distributed over a finite set Q and X is not e-close to having min 
entropy at least k. Then there exists a set Tel! with \T\ < 2 k such that Pr[X G T] > e. 

Proof. Take T = {a G £1 1 Pv[X = a] > 2~ fc }. Clearly, \T\ < 2 k since the sum of probabilities 
Pi[X = a] cannot exceed one. If Pr[X G T] < e we could change the distribution of X slightly 



max|P(S)-Q(S)| . 
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by moving the probability mass from T to other values so that the resulting r.v X' will have 
min entropy > k and will be e-close to X. □ 



We start by analyzing the case of two random variables: 

Theorem 4.3.2. Let X, Y be two (not necessarily independent) random variables distributed 
over¥ n and suppose one of them is uniformly distributed. Let a,b G F be chosen independently 
at random and let Z = aX + bY . Let a > be any real number such that q > n 10 / a . Then 
Z is e-close to having min entorpy rate 1 — a with e = q~ a / 10 . 

Proof. By symmetry we may assume w.l.o.g that X is uniform. If Z is not e-close to having 
minentropy rate 1 — a then, by Lemma 4.3.1 there is a set T C F n of size \T\ < g( 1 ~ a ) n 
such that Pr[Z G T] > e. Using the polynomial method, we will find a non-zero polynomial 
/ G F[xi, . . . , x n ] of low degree that vanishes on T. Let d be the required degree. We need d 
to satisfy 

fn + d\ i_ a 
{ d ) >q ^ 

Using the inequality ( n ~^ d ) > (d/n) n and the bound q > n 10 / Q we see that it is enough to 
take d = q 1 ~ a / 5 . 

For each x £ F n let 

p x = Pr[Z G T\X = x}. 

and let 

G = {xe¥ n \p x >e/2}. 

Since Pr[Z € T] > e we have that Pr[X G G] > e/2 (this follows from a simple averaging 
argument). Since X is uniform this implies |G| > (e/2)q n . We will now show that / vanishes 
on all points in G. 

Fix some x G G. We know that 

Pi[aX + bY G T | X = x}> e/2. 

Thus, we can fix Y = y to some specific value to that the same inequality still holds. That 
is, there is some y G F n such that 

Pr[ax + by G T] > e/2. 

Notice that in the last probability the randomness is only over the choice of a, b and that x, y 
are both fixed. Let g(a, b) = f(ax + by) be the restriction of / to the plane spanned by x, y. 
By the above calculation we get that g has at least (e/2)q 2 zeros. We know that g can have 
at most dq zeros (see Schwartz-Zippel lemma from the previous section) and so, if d > {e/2)q 
(which holds in our choice of parameters) we would get that g(a, b) is identically zero. Thus, 
we have that g(l, 0) = f(x) = and so we conclude that / vanishes on all of G. 
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Now, since / can have at most dq n 1 zeros (by Schwartz-Zippel) we get 

(e/2)q n < \G\ < dq^ 1 

which is a contradiction for the choice of e given in the theorem. This concludes the proof. □ 

Looking at things more broadly, a procedure such as the one described above is called a 
merger. Mergers allow us to combine several (dependent) random variables, one of which is 
uniform, into a single variable that has high min entropy. Mergers are allowed to use a short 
random 'seed' (given above by a, b € F) and one can show that without this seed the task is 
impossible. Above we analyzed a simple merger for two sources. Mergers for many sources 
are important in constructions of seeded-extractors which are procedures that can extract 
randomness from arbitrary distribution of low min entropy and that use an additional short 
random seed. One can generalize the construction above to work with many sources (taking 
independent coefficients a±,... , a& and outputting Yli a i-^i)- This is problematic, however, 
since the length of the seed grows linearly with the number of sources. One can however, 
pick the coefficients in a correlated way and get a merger with shorter seed. This is done by 
passing a curve of degree k through the k points X±,..., X^ and outputting a random point 
on this curve. The analysis given above, using the polynomial method, generalizes to this 
setting as well (see [DW08, DKSS09]). 
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Chapter 5 

Sylvester- Gallai type problems 



5.1 Sylvester- Gallai type theorems over the reals 

The Sylvester-Gallai (SG) theorem states that, in any configuration of n points in the real 
plane, not all on the same line, there exists a line passing through exactly two of the points. 
Another way of stating it is as saying that, if in a configuration of point, every pair of points 
is collinear with a third point, then all points must lie of the same line. This theorem has an 
extremely simple proof: Suppose the points v±, . . . , v n are not on a line, and let vi be a point 
such that the distance between u, and some line, say £ V1V2 , is minimal among all such distances 
(i.e., between a point and a line defined by the set of points). Now, the line £ VllV2 contains 
a third point V3. One can draw a picture and see that one of the distances dist(yi,£ Vi V3 ), 
dist (v2,£ Vi ,v 3 ), dist (v 3 , £ Vi jV2 ) or dist (v s , £ Vi >V2 ) is smaller than dist (vi,£ Vl>V3 ). □ 

Over the complex numbers this theorem is no longer true! There are configurations of 
points that lie in a two dimensional plane and with the property that every pair is collinear 
with a third point. The complex SG theorem, proved by Kelly in [Kel86], says that this 
is the highest dimension possible and that every such configuration is contained in some 
two dimensional affine (complex) plane. The proof of Kelly's theorem was originally using 
deep tools from Algebraic Geometry but recently an elementary proof was found by Elkies, 
Pretorius and Swanepoel [ES06]. 

A nice way to think about the SG theorem (and the way which leads to interesting gener- 
alizations) is as translating local information (about collinear triples) into global information 
(all points being on a line). We will now study a more relaxed version of this question when 
the local information is partial. We start with some definitions. The affine dimension of 
a set of points dim(ui, . . . ,v n ) is the dimension of the smallest affine subspace containing 
them. Given v\, . . . , v n we call a line passing through at least two points in the set special if 
it contains at least three points in the configuration. Otherwise we call the line an ordinary 
line. So, the standard SG theorem says that, in every configuration of dimension at least 2 
(or 3 over the complex numbers) there is at least one ordinary line. 
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Definition 5.1.1 (5-SG configuration). Let 5 G [0, 1]. The n distinct points v\, . . . ,v n G C d 
is called a 5-SG configuration if for every i G [n], there exists a family of special lines Li all 
passing through m and at least 5n of the points Vi,...,v n are on the lines in Lj. (Note that 
each collection Li may cover a different subset of the n points.) 

We will now prove the following theorem: 

Theorem 5.1.2 (Quantitative SG theorem). Let 5 G (0, 1]. Let v\, . . . ,v n G C d be a 5-SG 
configuration. Then 

dim{ Vl ,...,v n } <0(1/S 2 ). 

This theorem, proven in [BDYW11], does not imply Kelly's theorem since, for 5 = 1 we 
do not get the constant 2 (in the original paper the constant 10 is arrived at). A more recent 
work [DSW12] improves the techniques in the proof of Theorem 5.1.2 to give a quantitatively 
better bound of 0{ 1/(5), which also gives the constant 2 for 5 = 1 over the complex numbers 
(which gives a new proof of Kelly's theorem). 

5.1.1 Rank of design matrices 

The proof of Theorem 5.1.2 is by reduction to a question about the rank of matrices with 
certain restrictions on their zero/nonzero patterns. These are called design matrices: 

Definition 5.1.3 (Design matrix). Let A be an m x n matrix over some field. For i G [in] 
let Ri C [n] denote the set of indices of all non-zero entries in the i 'th row of A. Similarly, 
let Cj C [m], j G [n], denote the set of non-zero indices in the j 'th column. We say that A is 
a (q, k, t)-design matrix if 

1. For all i G \m], \Ri\ < q. 

2. For all j G [n], \Cj\> k. 

3. For all ji / j 2 G [n], \C h nC h \ <t. 

The reason for studying these matrices in connection with SG configurations will become 
clear later. For now, let us state the main result we will need to prove: 

Theorem 5.1.4 (Rank of design matrices) . Let A be anmxn complex (q,k,t) -design matrix. 
Then 

™k(A)>n-(<^y . 
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We will prove Theorem 5.1.4 in Section 5.2 and will continue now with the proof of 
Theorem 5.1.2. 

5.1.2 Proof of Theorem 5.1.2 using the rank bound 

Let V be the nx d matrix whose i'th row is the vector Uj. Assume w.l.o.g. that v\ = 0. Thus 

dimjui, . . . , v n } = rank(V). 

The overview of the proof is as follows. We will first build an m x n matrix A that will 
satisfy A ■ V = 0. Then, we will argue that the rank of A is large because it is a design 
matrix. This will show that the rank of V is small. 

Consider a special line I which passes through three points Vi,Vj,Vk- This gives a linear 
dependency among the three vectors Vi,Vj,Vk (we identify a point with its vector of coordi- 
nates in the standard basis). In other words, this gives a vector a = (ai, . . . , a n ) which is 
non-zero only in the three coordinates i,j, k and such that 0-^ = 0. If a is not unique, choose 
an arbitrary vector a with these properties. Our strategy is to pick a family of collinear triples 
among the points in our configuration and to build the matrix A from rows corresponding to 
these triples in the above manner. 

We will need the following combinatorial lemma. 

Lemma 5.1.5. Let r > 3. Then there exists a set T C [r] 3 of r 2 — r triples that satisfies the 
following properties: 

1. Each triple (ti,t2,t^) £ T is of three distinct elements. 

2. For each i £ [r] there are exactly 3(r — 1) triples in T containing i as an element. 

3. For every pair i,j £ [r] of distinct elements there are at most 6 triples in T which 
contain both i and j as elements. 

Proof. This follows from a result of Hilton [Hil73] on diagonal Latin squares and we will omit 
it (see [BDYW11] for more details.) □ 

Let C denote the set of all special lines in the configuration (i.e., all lines containing at 
least three points). Then each Li is a subset of C containing lines passing through Uj. For 
each t £ C let Ve denote the set of points in the configuration which lie on the line I. Then 
| V$\ > 3 and we can assign to it a family of triples T< C V/, given by Lemma 5.1.5 (we identify 
Vl with [r], where r = \V(\ in some arbitrary way). 

We now construct the matrix A by going over all lines t £ C and for each triple in Tg 
adding as a row of A the vector with three non-zero coefficients a = (a±, . . . ,a n ) described 
above (so that a is the linear dependency between the three points in the triple). 

Since the matrix A satisfies A ■ V = by construction, we only have to argue that A is a 
design matrix and bound its rank. 
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Claim 5.1.6. The matrix A is a (3, 3k, 6) -design matrix, where k = [5?i\ — 1. 



Proof. By construction, each row of A has exactly 3 non-zero entries. The number of non-zero 
entries in column i of A corresponds to the number of triples we used that contain the point 
Vi. These can come from all special lines containing Vi. Suppose there are s special lines 
containing Vi and let r±, . . . ,r s denote the number of points on each of those lines. Then, 
since the lines through V{ have only the point Vi in common, we have that 



The properties of the families of triples T# guarantee that there are 3(rj — 1) triples containing 
Vi coming from the j'th line. Therefore there are at least 3k triples in total containing v,. 

The size of the intersection of columns i\ and 12 is equal to the number of triples containing 
the points 1^, Vi 2 that were used in the construction of A. These triples can only come from 
one special line (the line containing these two points) and so, by Lemma 5.1.5, there can be 
at most 6 of those. □ 

Applying Theorem 5.1.4 we get that 



5.1.3 Extensions to other fields 

We will discuss Sylvester-Gallai type problems over small finite fields in Section 5.3. For now, 
let us see that Theorem 5.1.4 extends to any field of characteristic zero (or very large finite 
characteristic). Since the reduction to the #-SG bound was field independent, we can also 
extend Theorem 5.1.2 to these fields. 

The argument is quite generic and relies on Hilbert's Nullstellensatz. 

Definition 5.1.7 (T-matrix). Let m,n be integers and let T C [m] x [n]. We call an m x n 
matrix A a T-matrix if all entries of A with indices in T are non-zero and all entries with 
indices outside T are zero. 

Theorem 5.1.8 (Effective Hilbert's Nullstellensatz [K0I88]). Let gi,...,g s £ Z[yi, ■ ■ ■ ,yt] be 
degree d polynomials with coefficients in {0, 1} and let 



s 



D r ; - !) ^ k - 



3=1 




Which completes the proof. 



□ 



Z±{y eC t \g i (y) = 0Vi G [s]}. 
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Suppose h E Z[zi, ■ ■ ■ ,z t ] is another polynomial with coefficients in {0, 1} which vanishes on 
Z . Then there exist positive integers C, D and polynomials fi, ■ ■ ■ , f s £ ^[yi, ■ ■ ■ , Ut] such that 

s 

Y^fi- gi ^c-h D . 

i=l 

Furthermore, one can bound C, D and the maximal absolute value of the coefficients of the 
fi's by an explicit function H (d,t,s). 

Theorem 5.1.9. Let m,n,r be integers and let T C [m] x [n]. Suppose that all complex 
T -matrices have rank at least r. Let ¥ be a field of either characteristic zero or of finite large 
enough characteristic p > Po(n,m), where Pq is some explicit function of n and m. Then, 
the rank of all T -matrices over F is at least r. 

Proof. Let g\, . . . ,g s € C[{xjj | i £ [m],j G [n]}] be the determinants of all r x r sub-matrices 
of an m x n matrix of variables X = (xij). The statement "all T-matrices have rank at 
least r" can be phrased as "if x^ = for all T and gk{X) = for all k S [s] then 
j)eT Xi j = That is, if all entries outside T are zero and X has rank smaller than r then 
it must have at least one zero entry also inside T. From Nullstellensatz we know that there 
are integers a, A > and polynomials fi, ■ ■ ■ , f s and hij, (i,j) T, with integer coefficients 
such that 

a ■ ( n x v ) = e x a ■ h ^ + e • ^( x )- (5- 1 ) 

\(ij)6T / (ij)fff fc=l 

This identity implies the high rank of T-matrices also over any field F in which Since 
we have a bound on a in terms of n and m the result follows. □ 



5.2 Rank lower bound for design matrices 

We will now prove Theorem 5.1.4. First, we discuss a simpler case: 
5.2.1 The bounded entries case 

When the ratios between different entries of the matrix are bounded in absolute value (say, 
there are all in [1/c, c] for some constant c), the proof is quite easy. Observe that, in this 
case the n x n matrix M = A* A (A* is the conjugate transpose) is a hermitian matrix with 
diagonal elements which are all at least k/c in absolute value and the off diagonal are at most 
tc in absolute value. Also notice that it is enough to give a lower bound on the rank of M 
since it is equal to the rank of A. This bound will follow from the following simple lemma 
(see [Alo09] for more on this lemma) which provides a bound on the rank of matrices whose 
diagonal entries are much larger than the off-diagonal ones. 



78 



Lemma 5.2.1. Let A = (aij) be an n x n complex Hermitian matrix and let < £ < L be 
integers. Suppose that an > L for all i G [n] and that \ a^ \ < t for all i ^ j. Then 

r, nK A)> 1 + n n WL)2 >n- {n tlL?. 

Proof. We can assume w.l.o.g. that an = L for all i. If not, then we can make the inequality 
into an equality by multiplying the z'th row and column by (L/an) 1 / 2 < 1 without changing 
the rank or breaking the symmetry. Let r = rank(A) and let Ai, . . . , A r denote the non-zero 
eigenvalues of A (counting multiplicities). Since A is Hermitian we have that the Aj's are 
real. We have 

n 2 -L 2 = tr(Af=(±x) < r • £ A 2 

\i=l / i=l 
< r ■ (n- L 2 + n 2 ■ f). 

Rearranging we get the required bound. The second inequality in the statement of the lemma 
follows from the fact that 1/(1 + x) > 1 — x for all x. □ 

Plugging in the parameters in the above lemma we get a rank bound of n — 0(nt/k) 2 on 
the rank of M which is what we wanted to prove (in this case, the parameters q is not used). 
To handle the general case, when the entries are not bounded, we will use a technique called 
matrix scaling. 

Remark: Notice that it would suffice if the entries in each row where bounded (i.e., in a 
range [1/c, c]) since then we could scale each row to get a bounded ratio matrix. Recalling 
our application to the SG theorem, in which each row corresponded to a collinear triple, we 
can see that such an unbalanced triple (where some ratio is very large) can only come from 
a collinear triple t>i,t>2,t>3 such that the distance between vi,V2 is, say, much larger than the 
distance between V2,vs. Hence, if we know, for some reason, that such triples do not exist in 
our configuration we can just apply the above argument without a need for further work. 

5.2.2 Matrix scaling 

We now define matrix scaling: 

Definition 5.2.2. [Matrix scaling] Let A be an m x n complex matrix. Let p £ C m ,7 G C™ 
be two complex vectors with all entries non-zero. We denote by 

SC(A,p, 7 ) 

the matrix obtained from A by multiplying the (i,j) 'th element of A by pi ■ 7,-. We say that 
two matrices A, B of the same dimensions are a scaling of each other if there exist non-zero 



1 1 
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vectors p, 7 such that B = SC(A, p, 7). It is easy to check that this is an equivalence relation. 
We refer to the elements of the vector p as the row scaling coefficients and to the elements 
of 7 as the column scaling coefficients. Notice that two matrices which are a scaling of each 
other have the same rank and the same pattern of zero and non-zero entries. 

Matrix scaling originated in a paper of Sinkhorn [Sin64] and has been widely studied 
since (see [LSWOO] for more background). The goal is to find a scaling that satisfies certain 
conditions on the row/column sums. For example, given a square matrix (say, with non 
negative entries), we would like to find a scaling that makes the matrix doubly stochastic 
(i.e., with row sums equal column sums equal one). Sinkhorn showed that, if all entries 
are positive (no zeros) this is possible. The proof was using an iterative algorithm: keep 
normalizing the rwo sums and the columns sums in alternating steps. This will converge to 
a scaling that gives a doubly stochastic matrix (for a more efficient variant see [LSWOO]). If 
the matrix contains zeros things are a bit trickier. Take for example the 2x2 matrix 



It is clear that there is no scaling of this matrix that makes it doubly stochastic. However, we 
can 'almost' achieve this by making the row/columns sums arbitrarily close to 1. Sometimes, 
this approximate scaling is good enough, as we shall see in our application. Clearly, we 
need some condition on the pattern of zeros and non zeros of the matrix (or at least that 
no rwo/columns in zero!). The following definition will give a necessary condition that will 
suffice for our purposes (a more general condition which is both necessary and sufficient is 



Definition 5.2.3 (Non-zero diagonal). Let A be an n x n real matrix. We say that A has a 
non-zero diagonal if all of its diagonal entries are non-zero. If A is an nk x n matrix we say 
that A has non-zero diagonal if its rows can be reordered so that for each i = 0, . . . , k — 1 the 
rows m + 1, ■ ■ ■ , in + n for an nxn matrix with non-zero diagonal (i.e., A is, up to ordering, 
a concatenation of square non-zero diagonal matrices). 

The following is a special case of a theorem from [RS89] that gives sufficient conditions 
for finding a scaling of a matrix which has certain row and column sums. 

Theorem 5.2.4 (Matrix scaling theorem). Let A be an nkxn real matrix with non-negative 
entries and non-zero diagonal. Then, for every e > 0, there exists a scaling A' of A such that 
the sum of each row of A' is at most 1 + e and the sum of each column of A' is at least k — e. 
Moreover, the scaling coefficients used to obtain A' are all positive real numbers. 

The proof of the theorem uses convex programming techniques. One defines an appro- 
priate function and shows that, at the points at which it is maximized, the vanishing of the 




known. See [BDYW11, RS89]). 
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partial derivatives gives the required bounds on the row/column sums. We will prove this 
theorem in Section 5.2.4. 

We will need the following easy corollary of the above theorem. 

Corollary 5.2.5 (^-scaling). Let A = (<%•) be an nk x n complex matrix with non-zero 
diagonal. Then, for every e > 0, there exists a scaling A' of A such that for every i 6 [nk] 

\aij\ 2 > k — e. 

Proof. Let B = (6^-) = (|ajj| 2 ). Then B is a real non- negative matrix with non-zero digonal. 
Applying Theorem 5.2.4 we get that for all e > there exists a scaling B' = SC(B,p, 7), 
with p, 7 positive real vectors, which has row sums at most 1 + e and column sums at least 
k — e. Letting p\ = yfp[ and 7- = • v /t7 we get a scaling SC(A, p',j') of A with the required 
properties. □ 

5.2.3 Proof of Theorem 5.1.4 

To prove the theorem we will first find a scaling of A so that the norms (squared) of the 
columns are large and such that each entry is small. 

Our first step is to find an nk x n matrix B with non-zero diagonal that will be composed 
from rows of A s.t. each row is repeated with multiplicity between and q. To achieve this 
we will describe an algorithm that builds the matrix B iteratively by concatenating to it 
rows from A. The algorithm will mark entries of A as it continues to add rows. Keeping 
track of these marks will help us decide which rows to add next. Initially all the entries of 
A are unmarked. The algorithm proceeds in k steps. At step i (i goes from 1 to A;) the 
algorithm picks n rows from A and adds them to B. These n rows are chosen as follows: For 
every j G {1, . . . , n} pick a row that has an unmarked non-zero entry in the j'th column and 
mark this non-zero entry. The reason why such a row exists at all steps is that each column 
contains at least k non-zero entries, and in each step we mark at most one non-zero entry in 
each column. 

Claim 5.2.6. The matrix B obtained by the algorithm has non-zero diagonal and each row 
of A is added to B at most q times. 

Proof. The n rows added at each of the k steps form an n x n matrix with non-zero diagonal. 
The bound on the number of times each row is added to B follows from the fact that each 



and for every j £ [n] 
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row has at most q non-zero entries and each time we add a row to B we mark one of its 
non-zero entries. □ 



Since the matrix B is obtained from the rows of A its rank is at most the rank of A. Fix 
some e > (which will later tend to zero). Applying Corollary 5.2.5 we get a scaling B' of 
B such that the ^-norm of each row is at most \/T-i-e and the ^-norm of each column is at 
least y/k — e. 

Our final step is to argue about the rank of B' (which is at most the rank of A). To this 
end, consider the matrix 

M = (B'Y ■ B', 

where (B')* is A' transposed conjugate. Then M = (rriij) is an n x n Hermitian matrix. The 
diagonal entries of M are exactly the squares of the ^-norm of the columns of B' . Therefore, 

mi > (k - e) 

for all is [n\. 

We now upper bound the off-diagonal entries. The off-diagonal entries of M are the 
inner products of different columns of B'. The intersection of the support of each pair of 
different columns is at most tq since, in A the intersections were at most t, and each row in 
A is repeated at most q times in B (which has the same support as B'). The norm of each 
row is at most \Jl + e. For every two real numbers a, /3 so that a 2 + /3 2 < 1 + e we have 
\ot ■ f3\ < 1/2 + e', where e' tends to zero as e tends to zero. Therefore 

\mij\ < tq- (1/2 + e') 

for all i/j Applying Lemma 5.2.1 we get that 

rank(^) = rank(A') > n - ( - — ^ ^ J . 

Since this holds for all e > it holds also for e = 0, which gives the required bound on the 
rank of A. □ 

5.2.4 Proof of the matrix scaling theorem 

For simplicity we will prove the theorem for a square n x n matrix A (it is easy to modify 
the proof to fit the more general case). We wish to find a scaling of A with row/columns 
sums approaching 1. We will call a scaling with row/column sums exactly 1, a 'good' scaling. 
Since the constant 1 is arbitrary we will call a scaling 'good' even if its row/column sums are 
equal to some other constant. We will first give a condition on the pattern of zeros/non- zeros 
of A that guarantees A has a good scaling. Then we will argue about matrices with non-zero 
diagonal (which might not have a good scaling as we saw above) and obtain approximate 
scalings for all e > 0. 
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We first set up some convenient notations. Let 

5 = supp(^) = {(i,j) G [n] 2 \A id / 0}. 

For s = G [n] 2 we will denote A s = Aij. For each s = (i,j) £ [n] 2 let e s € M 2n be a 
vector with 1 at positions z and n + j and zeros everywhere else. We think of vectors in ]R 2n 
as divided into two parts: the first n coordinates corresponding to rows of A and the last n 
coordinates to columns of A. Thus, the vector e s corresponds to the (i,j)'s entry of A. We 
also define u £ R 2n to be the vector with all entries equal to 1/n. Also, let I £ M 2n be the 
all 1 vector and notice that 1 • u = 1 • e s = 2 for all s € [n] 2 (where x ■ y denotes the standard 
inner product). 

With these notations in place we can state our scaling problem in a nicer form. Since we 
are looking for positive scaling coefficients, it is convenient to treat all of them as exponential 
functions. Thus, we will find row/column coefficients p±, . . . , p n , 71, . . . , 7 n and the scaling 
defined by them will multiply the rows/columns by exp(pi), exp(jj) (which are always pos- 
itive). Solving for these exponents will make the problem easier to analyze as we shall now 
see. 



Claim 5.2.7. There exists a good scaling of A iff there exists x S W 1 such that 

(A s exp(x ■ e s )) e s = u. (5.2) 



s£S 



Proof. Consider the i's position in u. The above equality implies 

^ Ai j exp(xi) exp(x n+j ) = 1/n 

and so, in the scaling with row coefficients x±, ■ ■ ■ , x n and column coef x n +i, . . . , X2 n , the row 
sums are all 1/n. Similarly, the column sums are also 1/n and so every x satisfying (5.2) 
gives a good scaling. Conversely, give a good scaling with positive coefficients, we can take 
logarithms and find an x solving (5.2). □ 

Notice that (5.2) implies that u must be in the convex hull of the vectors e s ,s £ S. To 
see this, take the inner product with (1/2)1 - this implies that the sum of coefficients in the 
linear combination of the e s 's is one. Thus, this is also a necessary condition for having a good 
scaling. What we will show below is that this is also a sufficient condition for approximate 
scaling and that, if u is in the interior of the convex hull (i.e., if there is a convex combination 
of the e s 's with all coefficients non zero) then there is a 'good' scaling. Below, we will prove 
the following lemma, which is the hardest part of the proof. 

Lemma 5.2.8. For all vectors v € M 2n such that 1 • v = 1 • e s for all s € S the following 
holds: If v is in the interior of the convex hull of the vectors e s ,s £ S then there exists an 
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x £ M. satisfying 

^(Agexpix ■ e s ))e s = v. (5.3) 
seS 

If v = u = (l/n)l, £/ns condition implies that A has a good scaling. 

We defer the proof for the subsection below and continue with the proof of Matrix-Scaling 
theorem. Since A has a non zero diagonal, the vector u is in the convex hull of e s ,s £ S 
(just take the combination of e s with s = (i,i) and coefficients 1/n). However, it might not 
be in the interior and so we cannot hope to find a good scaling. However, for every e > 
there is a vector u' of distance at most e from u that is in the interior. Thus, we can find a 
scaling of A with row/column sums equal to the entries of u'. This implies the existence of 
an approximate scaling for every e. 



Proof of Lemma 5.2.8 

The idea is to define a convex function f{x) such that, if / has a minimizer (w.l.o.g it is a 
global minimizer since / is convex), the vanishing of the gradient at that point implies the 
equality in (5.3). Then we will show that / obtains a minimum by showing that its value 
grows to infinity when ||x|| does. 
The function we will use is 



f{x) = In I ^2 A s exp(x • e s ) J e s - x ■ v. 

\ses / 

A simple application of Cauchy-Schwartz shows that / is indeed a convex function. It is 
straight forward to verify that the gradient of / (the vector of 2n partial derivatives) is 

y/(x) = E 8gS ^ 8 exp(s-e fl )e 8 _ ^ 
E s6 s A a exp(x ■ e s )) 

and so, if V/(x) = then x satisfies (5.3) up to scaling of v (which can be scaled back to 
one by a rescaling of x). 

We now show that / goes to infinity when ||x|| does. This is not precisely true: let 

F = {y e R 2n | y ■ (e s - ey) = Vs, s G S} 

be the subspace of vectors y for which y ■ e s is constant for all s £ S. Observe that f(x + y) = 
f(x) for all x £ M? n and all y £ F. Let E = F 1 - be the dual subspace to F. Hence, we can 
think of / as a function on E and use the fact that / has a minimizer on ]R 2n iff it has one 
on E. The fact that / has a minimizer on E will follow from the following claim: 



Claim 5.2.9. There exists a constant C\ £ R and a positive constant C2 £ R such that for 
all x £ E we have 

f(x)>C 1 + C 2 \\x\\. 
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Proof. Notice that, since E(~)F = we have that for all nonzero x G E the following quantity 



is positive. Let 



A(x) = maxx ■ e s — minx • e„ 



a = min A(x) > 0. 

xeE,\\x\\=i 



Notice that, for all x we have A(x) > a||x||. 

To prove the claim, fix some x G E. Let s m , sm G S be such that A(x) = x • e SM — x ■ e Sm . 
Since v is in the interior of the convex hull of e s , s G 5 there are strictly positive coefficients 
A s £ R, s g S such that v = X^eS ^ ses anc ^ Sse5 ^ s = ^ ^ ne following calculation completes 
the proof: 

f(x) > ln(A SM exp(x • e SM )) - x ■ v 
= Ci + x ■ e SM - x ■ v 
= Ci + X s (x ■ e SM - x ■ e s ) 

> Ci + X Sm (x ■ e SM - x ■ e Sm ) 
= d + X Sm A(x) 

> Ci + A Sm a||x||. 

□ 

This completes the proof of the lemma. 



5.3 Sylvester- Gallai over finite fields 

Let F denote a finite field of q elements. We can extend our definition of SG configurations 
(and (5-SG) to finite fields. To simplify matters we will replace the collinearity condition with 
linear dependence (i.e., we will assume we have many dependent triples). This will require 
us to assume that no two points are multiples of each other (or they will be dependent with 
any third point). We will call a set of points v\, . . . , v n G F n a proper set if no two points are 
a constant multiple of each other and the zero points is not in the set (so a proper set is a 
subset of projective space). 

Definition 5.3.1 (SG configuration). Let V = {vi, . . . , v n } c¥ d be a proper set of points. V 
is called an SG configuration if for every i ^ j G [n], there exists k G [n] \ {i, j} with Vi, Vj,Vk 
linearly dependent. V is a 5-SG configuration, with S G [0, 1] if for each i there are at least 
5n values of j for which there exists k s.t Vi^Vj^Vf, are linearly dependent. 
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To simplify the presentation we will restrict ourselves to the case 5=1, where every pair 
is in some dependent triple. The results we will prove can all be generalized to the more 
general case, when 5 can be any constant, in a straightforward way. We will mention along 
the way which changes need to be done to handle arbitrary 6. 

Using this definition, one can ask the same question as before: 'what is the smallest 
dimension of a SG configuration?'. To see that the answer has to be different from the 
real/complex case notice that the set V = ¥ d (taking one representative from each line 
through the origin) is an SG-configuration and so we could have dim(V) > log^ n (with 
n = \ V\ ~ q^ 1 ). Also, if F has characteristic p we can take the set V = ¥ d (modulo constant 
multiples) and get dim(U) > log p n. We will prove two bounds: The first is a generic lower 
bound of dim(F) < 0(log 2 n) which holds over any field [GKST02, DS06]. The second result 
will be a bound of the form dim(U) < O(log p n) + poly(p) over prime fields of size p [BDSS11]. 
This second bound is asymptotically tight, as the V = ¥ d example shows, for any constant 
p. When p is a growing function of n a bound of the form 0(log„n) is conjectured to exist 1 . 

Another way of stating these two bounds is as saying that, if V C ¥ d is an SG configuration 
of dimensions d = dim(V) then \ V\ > 2^ d ^ (over any field) or \V\ > p n( > d \ when F is a prime 
field of size p < Thus, the size of the smallest SG configuration of dimensions d 

grows exponentially with d (with the basis of the exponent being larger for fields of larger 
characteristic). 

5.3.1 The 0(\og 2 n) bound 

We will prove this bound in two stages. First we will prove it over ¥2 and then see how to 
handle arbitrary fields in a similar way. Let V = {v±, . . . , v n } be an SG configuration in F 2 . 
W.l.o.g the dimension of V is equal to d and so, we can perform a linear change of basis so 
that v\, . . . ,Vd are the standard basis vectors e%, . . . , (with having one in coordinate i and 
zero elsewhere). We will only use the SG property for v±, . . . , Vd (i.e., the fact that for each 
i E [d] and each j there is a A: s.t Vi,Vj,Vk are dependent). Therefore, the bound 0(log 2 n) 
will hold also for this special case (and, in this case, it is tight over any field. See below). 
Observe that, when V{ = e,, if the triple Vi,Vj,Vi- is dependent then we have ej = Vj + 
or, in other words, Vj,Vk differ only in the i'th coordinate. Let B = {0, l} d be the boolean 
cube with edges going between vectors that differ in exactly one coordinate. Consider V as 
a subset of B and let us try to estimate the number of edges of B that connect two element 
of V. For each we have at least Q(n) edges in 'direction' i (i.e. pairs that differ in the 
i coordinate alone). This follows from the SG property and the discussion above. Thus, in 
total we have at least Vl(n ■ d) edges inside V. We now use the following lemma, which is 
known as the 'edge isoperimetric inequality for the hyper cube' (the bound we prove is not 
the best possible but it will suffice for our purposes). 

1 We are not aware of any results for large finite fields of small characteristic, other than the general log 2 n 
bound. 
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Lemma 5.3.2. Let S C B = {0, l} d be some subset of the boolean hypercube. Then there are 
at most |S| log 2 15*1 edges going between elements of S. 

Proof. The proof is by induction on d, where the case d = 1 is trivial. Write So for the set 
elements on S with 1st bit equal to zero and let Si = S\ Sq. Let E(S) denote the number 
of edges in S and let E(Sq), E(Si) be defined similarly. We can think of So, Si as subsets 
of the d — 1 dimensional cube and so, by induction, both are bounded by |So| log 2 |So| and 
| Si | log 2 | Si | respectively. Observe that the edges in S are divided into three disjoint sets: 
the edges in So, the edges in Si and the edges between So and Si. This last set of edges has 
size at most min{|So|, |Si|} since each element in So can have at most one neighbor in Si 
and vice versa. We thus have 

E(S) < |So|log 2 |S | + |Si|log 2 |Si|+min{|S |,|Si|}. 

Let m = \S\ and consider the function 

f(x) = x log 2 x + (m — x) log 2 (m — x) + x 

in the range < x < m/2 (we think of x as being equal to min{|So|, |Si|}). Using some 
basic calculus we see that f(x) is maximized at the end points at which it is equal to /(0) = 
/(m/2) = mlog 2 m. This implies E(S) < ?nlog 2 m as was required. □ 

Using the lemma we get 

n ■ d < 0(n ■ log 2 n) 

or d = dim(U) < 0(log 2 n). 

Now consider an arbitrary field F (not necessarily finite) and suppose V C ¥ d with 
d = dim(V). We will show how to use V to find a subset of the boolean cube B that has 
roughly n-d edges. Suppose e*, Vj,Vk is a dependent triple as before. Now, there exist nonzero 
field coefficients a, b such that e, = avj + bvt- This, however, does not imply that Vj, Vk differ 
in only the i'th coordinate. To be able to derive such a conclusion we would like to have 
a = —b. This would be true if we knew that Vi and Vj have the same value in some coordinate 
other than i. To make this happen (in most triples) we will normalize each vi so that its first 
non zero coordinate is 1. More formally, let f(v) € [n] be minimal I G [n] such that the £'th 
coordinate of v is non zero. We can multiply each Vi by a constant so that (ui)/(„.) = 1 for all 
i (clearly this keeps the SG property intact). Now, for each i we have a set Mi of ~ n pairs 
Vj, Vk so that ei is spanned by Vj, v^- Call a pair (vj,Vk) £ Mj 'good' if both f(vj) and f(vk) 
are not i. If ej is spanned by a good pair (vj,Vk) € M, then, we must have f(vj) = f(vk) and 
so, by the above, = avj — av^ and so Vj,Vk differ only in the i'th coordinate. 

Claim 5.3.3. There are at least J7(n • d) good pairs (in all of Mi, . . . , together). 

Proof. The total number of pair is Q,{n ■ d) and so we only have to bound the number of 'bad' 
pairs. Each vector vj can be 'responsible' for a bad pair in only one of the Mj's, namely in 
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Mf^ v .y Therefore, the total number of bad pairs is bounded by 0{n). This complete the 
proof (assuming d is larger than some absolute constant). □ 



We will now reduce to the binary case by mapping each field element randomly to either 
or 1. Every good pair will remain good with probability at least 1/2 and so (using expec- 
tations) we can find a set V' C B (which might be smaller than V) that has at least Q{n ■ d) 
edges in B. The bound now follows from the isoperimetric inequality. 

Remark 1: The same proof as above works for <5-SG configurations using the lower bound 
tt(5dn) on the number of edges inside V and gives a f2(<5 _1 log 2 n) upper bound on the 
dimension. 

Remark 2: To see that the 0(log 2 n) bound is tight for the special case we considered 
(when we only use dependent triples containing e±, . . . , e^) take V = {0, l} d C ¥ d where F is 
any field. For every e, and every v G V one of the vectors v + ei or v — ej are in V and so we 
have an SG configuration. 

5.3.2 The 0(log p n) bound over prime fields 

For the rest of this section F will denote a finite field of prime size p. The example V = {0, l} d 
shows that, to prove the stronger bound of 0(log p n) we must go beyond the isoperimetric 
inequality. The new ideas in the proof will come from additive combinatorics. The SG 
property can be translated into bounds on the additive growth of the set V (up to some 
scaling) and these bounds will be exactly those encountered in the Balog-Szemeredi-Gowers 
theorem, encountered in Section 3.3. We rephrase this theorem here in a slightly different 
form (whose proof is a simple reduction to the one we saw) . 

Theorem 5.3.4. [Balog-Szemeredi-Gowers] Let A C G be a set of size N in an abelian group 
G. Suppose that 

\{{a x ,a 2 ) G A 2 \ ai +a 2 G A}\ > N 2 /K. 

Then, there exists a subset A' C A with \A'\ > N/K c and with \A' + A'\ < K C N, where c > 
is some absolute constant. 

Another ingredient we will need is the following important result of Ruzsa: 

Theorem 5.3.5 (Ruzsa [Ruz96b]). Let A C ¥ d be such that \A + A\ < K\A\. Then, there 
exists a subspace W c¥ d containing A with \W\ < K°p \A\, where c is an absolute constant. 
This implies dim(^4) < log p \W\ < log p \A\ + K c for some other constant d . 

We will prove Ruzsa's theorem below and continue with our proof of the (9(log p n) bound 
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for SG configurations. The first part of the proof will use the above two theorems to find a 
large subset of V of small dimensions. 

Lemma 5.3.6 (Small dim subset). Let V = {vi, . . . , v n } C ¥ d be an SG configuration. Then 
there exists a subset V' C V with \V'\ > n/p c such that dim(y') < log„n + p c for some 
constant c > 0. 

Proof. Let A = {Xvi \ i G [n], A G F*} be the set of size (p— l)n containing all non zero constant 
multiples of elements from V (recall that no two elements of V are constant multiples of each 
other). Every dependent triple Vi,Vj,Vk in V with, say, aivi + ajVj = a^vt implies that the 
sum of the two elements aiVi,ajVj (both in ^4) is also in A. Using the SG property and this 
observation we get that, for each a G A there are at least 0(| /p) elements a' £ A such that 
a + a! G A. Using the BSG theorem we get that there is a subset A' C A of size \ A'\ > |^4|/p c 
such that \A' + A'\ < p c \A\. Ruzsa's theorem now implies that 

dim(^4') < log p | A'\ + p c < log p n + poly(p). 

We can now take V' C V to be set of all elements that have some multiple in A' . The size of 
V' is at least |^4'|/p and its dimension is bounded by that of A' . This completes the proof. □ 

We will now show how to 'grow' the set V so that it contains the entire set V without 
increasing its dimension by much. Let V be a subset of V given by the Lemma. W.l.o.g we 
may assume that span(U')nU = V' (otherwise replace V' with its span in V). Let w £ V\ V' 
be some element not in V' (and so also not in the span of V'). Using the SG property we 
know that for each v' 6 V there is some u €V such that w,v',u are dependent. Since w is 
not spanned by V 1 we cannot have u G V' . Thus, we can define a function / : V' i— >■ V \ V' 
such that for all v' £ V we have w,v',f(v') dependent. Observe that if f{y') = f(v") = u 
then both v' and v" are in the span of w,u which has size at most p 2 . This implies that the 
set of images f(V) has size at least \ V'\/p 2 . Now, let V" = span(U'U {w}) n V. I.e., add w to 
V and take the span of the resulting set inside V. Clearly dim(U") = dim(U') + 1. We also 
add all the elements of f(V) to the set V" since they are spanned by w and some element 
of V. This means that \V"\ > |V|(1 + 1/p 2 )- Continuing in this manner poly(p) times we 
will eventually add all the elements of V and the dimension will grow by an additive factor 
of poly(p). This implies that dim(V) < log p n + poly(p) as was required. 

Proof of Ruzsa's theorem 

Let k-A = A-\-... + Ak times. For g G G we denote by A + g = {a + g\a £ A}. We can 
assume w.l.o.g that A = —A = {—a \ a G ^4} since otherwise we can replace A with A U — A, 
which will also not grow in addition using Ruzsa calculus. 

Consider a maximal integer r such that there exist b\ , . . . , b r G 3 • A for which the r sets 
A + bi,i G [r] do not intersect each other. Notice that each of these r sets is contained in 
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the set 4 • A, which, using Ruzsa calculus, has size at most < .ff c |j4|. Thus, r < K c . By 
construction, we have that for every b £ 3 • A the set b + A intersects fej + A for some [r]. 
Such an intersection implies that b £ A — A + bi = 2 ■ A + b{. We can thus conclude that 

3 • A C U ie[r] {2- A + bi). 

Iterating, this means that 

k ■ A c 2 • A + span(fei, . . . , 6 r ) 
for all k. Thus, the span of A has size at most 

|span(4)| < |2- A\-p r < K c p RC ■ \A\. 

This concludes the proof. 

5.4 Locally Correctable Codes 

We will now see how the question of bounding the dimension of 5-SG configurations comes 
up naturally in the context of error correction. 

5.4.1 Error Correcting Codes 

We start by defining Error Correcting Codes (ECCs). We will focus on linear ECCs since 
these are the most well studied. One way to view an ECC is as a subspace C C F n , where F 
is some finite field. We say that the code has minimum distance D if for all x ^ y € C the 
Hamming distance (the number of different coordinates) between x and y, denoted A(x, y), is 
at least D. We will sometimes also refer to the normalized minimum distance as the minimum 
distance divided by n. The rate of the code C is defined as r{C) = dim(C)/n. Codes with 
high rate and high distance can be used to transmit messages in the presence of errors. More 
precisely, suppose dim(C) = d and let E c : ¥ d ^ F n be a linear mapping whose image is 
C (thus, Ec is a bijection on its image). To send a message x G ¥ d we send its encoding 
y = Ec(x) instead. Now, suppose that the transmission is noisy and that the actual received 
string was not y, but some y' € F" with A(y, y') < D/2. The receiver could then determine 
y (and from it, x) uniquely from y' since there could not be two distinct 2/1,2/2 £ C with both 
A(2/i,2/') and A(y2,y') smaller than D/2 (this would imply A (2/1, 2/2) < D). 

A nice example of an error correcting code is the Reed Solomon Code: We take F to be 
a finite field of size at least n and fix some n distinct field elements a±, . . . , a n £ F. Fixing d 
to be any integer between 1 and n we let C be the following subspace: 

C = {(/(ai), • • • , f(a n )) G F n I / € F[T] has deg(/) < d}. 

I.e, C is the subspace of n-tuples that are the evaluations of degree < d univariate polynomials 
on n distinct points in the field. The minimal distance of this code can be readily computed 
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since two polynomials of degree < d can agree on at most d places. This means that the 
distance between two distinct vectors in C is at least n — d. Thus, if we take, for example, 
d = n/2 we will get a code of rate ~ 1/2 and (normalized) minimum distance 1/2. Such codes, 
with constant rate and distance, are sometimes called 'good' codes (for obvious reasons). 
Obtaining good codes over smaller alphabet (idealy, for applications, with |F| = 2) can be 
also be obtained using additional ideas. It is important to note here that taking C to be a 
random subspace of dimension d will result, with high probability, with a good code. However, 
this type of construction will not give us any efficient way to perform the decoding (other 
than going over all elements of C). 

Coding theory is a vast area of research spanning Engineering, Computer Science and 
Mathematics and we will not attempt to give a full introduction here. The basic questions 
on existence/constructions of ECCs of the form described above have, to the most part, 
satisfactory (if not complete) answers. Our focus will be a specific kind of ECCs - Locally 
Correctable Codes (LCCs)- that are very poorly understood and tightly related to questions 
regarding SG configurations. LCCs are variants of Locally Decodable Codes (LDC), first 
defined and studied in a paper by Katz and Trevisan [KTOO] and much of the discussion 
below appeared in that seminal work. 

5.4.2 Locally Correctable Codes 

In the usual ECC setting, the decoder takes a received word y' G F™, runs some sophisticated 
algorithm on y' and returns the unique y G C which minimizes A(y,y'). For example, in 
Reed Solomon codes, given a noisy list of values of a polynomial of low degree, we want to 
interpolate the unique polynomial that agrees with this list in the largest number of places. 
This type of decoding algorithm is usually very 'global', meaning that if one wanted to 
compute even one coordinate in the 'corrected' y, they would still need to compute the entire 
y (and then output a single coordinate). Locally Correctable Codes allow the receiver to 
recover y from y' in a more local way: The decoder can, given an index i £ [re], recover the 
i'th coordinate of the unique closest y £ C, looking at a small random sample of positions in 
y'. Such a decoding procedure cannot alway be correct (since the few places we look at might 
all contain errors), but it could be correct w.h.p over the choices of the coordinates we choose 
to read. To make the connection to SG configuration clearer we will define a code C C F™ 
of dimension d as an ordered list of vectors V = (v±, . . . ,v n ) G (¥ d ) n (possibly containing 
repetitions), each corresponding to a single coordinate in [n]. Given such V we take as our 
code the subspace 

C v = {({x, vi),..., (x,v n )) 6F n | xe¥ d }. (5.4) 

Notice that, in this way of writing things, if some v i is in the span of some other set of vectors 
{vj 1 , . . . ,Vj r } in the list V then the i'th position of any y G Cy can be recovered from the 
positions yj 1 , . . . , yj r . Simply write 

r 

V i=/ J a i v j t 
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and then we have 

r r 

yi = (x,Vi) = ^2a e {x,v je ) = S ^a ( y k . 
1=1 1=1 

We now give a formal definition of LCCs. We will allow the base field to be any field 
(even infinite). 

Definition 5.4.1. An (r, 5)—LCC of dimension d is an ordered list of vectors V = (v\, . . . , v n ) G 
(¥ d ) n such that dim(f i, . . . , v n ) = d and with the following property, called the LCC property: 
For each i G [n] and every set S C [n] of size at most Sn there exists a set R C [n] \ S with 
\R\ < r such that Vi G span{vj | j G R}. The parameter r is called the query complexity ofV. 

It is not immediately obvious why this definition is the right one. True, for every set of 
'errors' S C [n] of size at most 5n there are r positions outside this set (so the values there 
are correct) which determine the i'th coordinate. So, if we 'knew' where the errors were, we 
could locally correct any coordinate 2 . But what if we do not know where the errors are? The 
following simple and useful lemma will help us resolve this issue. To state the lemma we will 
require the following definition: 

Definition 5.4.2 (r-Matching). LetVt be some finite set. A family of subsets M = {R\, . . . ,Rk} 
with each Ri CD is called an r-Matching in VL if 

• For all i <E [k], 1 < \Ri\ < r. 

• For all i / j <E [k], RiD Rj = 0. 

We denote the size of the r-matching M by \M\ = k. We call M a regular r-Matching if all 
sets Ri are of size exactly r. When r is obvious from the context we will sometimes omit it 
and refer to M simply as a matching. 

Lemma 5.4.3. Let V = (v±, . . . , v n ) G (¥ d ) n be an (r, 5) — LCC. Then, for each i G [n] there 
exists an r-Matching Mi = {-R^i, . . . , Ri t k} ^ n [ n ] with |Mj| = k > (5/r)n such that for very 
i G [n],j G [k] we have vi G span{v£ \ I G R%,j\- 

Proof. For each i we can construct the r-Matching Mj iteratively: As long as |Mj| < (5/r)n 
the r-tuples in Mi can cover at most 8n other coordinates and so there has to be an r-tuple 
we can add that is disjoint from all of them. □ 

In other words, for every i G [n] there is a large (at least Q(n) if both 5 and r are constants) 
family of small disjoint sets of coordinates that determine the i'th coordinate. Now, if the 
fraction of errors is at most 5' « 5/r only a small fraction of the sets in each Mj will contain 

2 This kind of decoding is sometimes interesting in its own right. 
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some corrupted coordinate and so, picking a random set in Mj will not contain errors w.h.p, 
allowing for correction of the z'th position. This is stated precisely by the following lemma, 
which justifies Definition 5.4.1 and connects it with our intuitive description of LCCs. 

Lemma 5.4.4. Let V = (v\, . . . ,v n ) G (F d ) n be an (r, <5) — LCC and let Cy be defined as in 
(5.4)- Let 5' = eS/r. Then, there exists an efficient randomized algorithm 3 Dec : F™ x [re] 1— > F 
with the following properties 

• For all y G Cy and all y' G ¥ n with A(y,y') < 5'n we have Vx\Dec{y' ,i) = yi] > 1 — e, 
where the probability is over the internal coin tosses of Dec. 

• For all y' G F n and all i G [n], the invocation of Dec(y,i) reads at most r positions in 
the input y' . 

Proof. Dec(y',i) will simply pick a random j G [k], where k is the size of the r-Matchings 
Mi = {Ri,i> ■ ■ ■ j Ri,k} given by Lemma 5.4.4, and compute yi from the coordinates {y£ \ £ G 
Ri,j} using the fact that Vi G span{t^|£ G Rij}. Since the distance A(y,y') is at most 5'n 
there could be at most 5'n = e ■ (5/r)n < e|Mj| sets Rij G Mj that contain a coordinate in 
which y and y' differ. Thus, with probability at least 1 — e the decoding will succeed. □ 

When r is not constant (say r = logn) LCCs are still interesting but the loss of 1/r in the 
decoding distance is no longer acceptable. A more restrictive definition of LCCs can be made 
along the lines of the last lemma, requiring that there exists a decoding procedure Dec(y, i) 
that returns yi with high probability in the presence of 5n errors, reading only r positions. 
Here, we opt for the cleaner statement given in Definition 5.4.1. Notice that, for the purpose 
of proving upper bounds on the dimension of an LCC V, our definition is more general and 
upper bounds for our definition will imply upper bounds for the stronger definition. 

5.4.3 Random codes are not locally correctable 

The property of being able to decode symbols of the codeword locally is very appealing for 
real life coding applications. However, for such codes to be used in practice their dimension, 
which determines the amount of information they can encode, cannot be too small. There is 
a hugh gap between the known upper and lower bounds on the dimension of LCCs with small 
r. This is surprising considering the good understanding we have of 'regular' ECCs (without 
local correction). A partial explanation for this discrepancy is the fact that a random code 
(of reasonable dimension) is not an LCC. More formally, suppose |F| = q and pick the list 
V = (v\, . . . , v n ) at random (i.e., pick each Vi i.i.d in ¥ d ). The probability of any r + 1 of the 
chosen vectors to be dependent is at most q r ~ d (this bound is the probability that the last 
vector is in the span of the previous r). This probability is exponentially small when r « d 

3 We assume our algorithm can perform field operations at unit cost. 
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and so, unless |V| > q n ^ (i.e., dim(U) = 0(log q n)) we will not see even one dependent r + 1- 
tuple. Since an LCC must have a quadratic number of dependent r + l-tuples, this shows that 
a random code is not an LCC 4 . Thus, the construction of LCCs with high dimension and low 
query complexity is morally different than the construction of regular ECCs. constructing 
an ECC amounts to finding a structured example of an object that exists almost everywhere. 
Constructing an LCC is a task of finding a very rare object with extremely delicate local 
properties. 

5.4.4 2-Query LCCs and SG configurations 

The case r = 1 is not very interesting since it is easy to show that the best (and only) (1,5)- 
LCC will have dimension at most 1/6 (since every coordinate must repeat, up to constant 
multiples, at last 5n times). The case r = 2 is already much more interesting and, in this 
case, we have a pretty good understanding of the parameters obtainable by LCCs. This case 
is also where the connection to SG-configuration will become clear. Suppose F is finite field 
of size q. A trivial construction of a (2, J)-LCC with constant 5 is to take V to contain all 
vectors in F n . Not accidentally, this is also the trivial construction of an SG-configuration. 
If it is not clear by now, let us state the following easy Lemma: 

Lemma 5.4.5. Suppose V = {v\,...,v n } C ¥ d is a 5-SG configuration (see Section 5.1). 
Then the list V = (v±, . . . ,v n ) is a (2,S/3)-LCC. 

Proof. Suppose S C [n] is some set of size \S\ < (5/3)n and fix some i 6 [n]. We need to 
show that there is a pair j,k G [n] \ S such that Vi is spanned by Vj,Vk- Using the <5-SG 
property we know that there is a set T C [n] \ S of size \T\ > (5/2)n such that for each j G T 
there is some k = k(j) G [n] such that Vi,Vj,Vk are linearly dependent (recall that in V not 
two vectors are a constant multiple of each other). If there is some j G T for which k(j) g" S 
we are done since span Vi and both are outside S. If for all j G T we have k(j) G S 

then there must be a collision (since \T\ > \S\) of the form k(j) = k(j') and then both Vj,Vj/ 
are in the span of Vi^u^y Since Vj,Vj> are independent, they must also span Vi and, again, 
we are done since both are outside S. □ 

Hence, every <5-SG configuration (over any field, not only finite fields) give a 2-query LCC 
of distance ~ 5. But is the opposite also true? Can we take any (2, <5)-LCC and convert it to 
a <f-SG configuration with 5' ~ 51 I do not know the answer to this question but suspects 
that it might be true. The main (and only) difficulty is that an LCC V is a list that can 
have repetitions. It seems reasonable to conjecture that repetitions should not be useful in 
creating a good LCC and that, perhaps with some careful combinatorial work, they can be 
eliminated (at some negligible cost to the other parameters) . Even though we do not know of 
a black-box reduction from 2-query LCCs to SG configurations, all known upper bounds on 

4 Random codes in the regime dim(V) ~ log 9 n are studied in [KS10] and can be shown to have local 
decoding properties. 
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the dimension of SG-configurations extend (with significantly more work) to 2-query LCCs. 
Suppose V is a (2, <5)-LCC over a field F. The known bound can be summarized as follows: 

• Over any field dim(V) < 0((l/tf) log 2 n) [GKST02, DS06]. 

• Over prime fields ¥ p we have dim(F) < poly(p/<5) + 0((1/S) log p n) [BDSS11]. 

• Over fields of characteristic zero (or characteristic >> expexp(n)) we have dim(V) < 
poly(l/tf) [BDYW11]. 

The proofs of all three bounds use the same basic ideas used for SG-configurations with 
an extra layer of arguments added to handle repetitions in V. Since these arguments are 
quite cumbersome and taylor made for each proof, it would be very desirable to find a clean 
black-box way of getting rid of repetitions for any LCC (even with more than two queries). 

5.4.5 Constructions using polynomials 

When r > 2 our knowledge is quite limited. The best constructions are those coming from 
multivariate polynomials or Reed-Muller codes [Ree54, Mul54]. We have already encountered 
these when we discussed the polynomial method over finite fields. Let F be a field of size 
q and let n = q s for some s. We will construct an (r, <5)-LCC V = (vi, . . . ,v n ) in F n by 
describing the subspace Cy C F™ (see Eq.5.4). Let F^fzi, . . . , z m ] be the set of polynomials 
in m variables of degree at most e. We identify the set of coordinates [n] with the set F m 
using some fixed one-to-one map r : [n] *— > F m and define 

Cv = {(/(r(l)), • • • , /(r(n))) £ F n | / £ F^ 2 ) [z u ..., z m }}. 

That is, a codeword in C is the vectors of evaluations of a polynomial of degree < q — 2 on the 
entire space F m . 5 We will now argue that this code is a (q— 1, <5)-LCC (with some constant 5). 
Consider an index i £ [re] and its associated point t{i). On every line in F m passing through 
r(i), the values of a degree q — 2 polynomial in q — 1 places on the line determine the rest of 
the values on the line. Thus, the value of a codeword at coordinate r(i) can be determined 
from any q — 1-tuple of coordinates corresponding to the points on any line through r(i). 
Since the space F m can be covered completely by lines passing through r(i) we can find such 
a line outside any set S C F m with \S\ < q m ~ 1 . Thus, we can take 5 = 1/q. We can make 5 
independent of q by reducing the degree of the polynomials from q — 2 to , say, q/W- Then, 
we only need to find a line passing through r(i) with at least q/10 + 1 points outside S. A 
simple probabilistic argument shows that such a line exists if \S\ < n/10. The dimension of 
the code Cy described above is equal to the number of coefficients in a degree ~ q polynomial 
in m variables, where m = log q n. When r ~ q, the number of queries, is a constant and n 
tends to infinity, this dimension is roug hly m°W = (log g n)°M. This is a power of log a n 
that depends linearly on the number of queries. Thus, these codes are quite far from being 
applicable in practice when we wish to have dimension close to n (or at least polynomial in 
n). 

5 One can easily come up with the explicit vectors in the list V = (vi, . . . ,v n ) and this is a good exercise. 
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5.4.6 General upper bounds on the dimension of LCCs 



When r > 2, the upper bounds on the dimension of r-query LCCs are quite weak. The 
following bound works for any LCC and degrades quite quickly with the number of queries: 



Theorem 5.4.6 (Katz-Trevisan [KTOO]). Let V = {v x , . . . , v n ) be an (r, 5)-LCC in F n . Then, 
when n goes to infinity and r and 5 are fixed, we have 



Proof. First, we use Lemma 5.4.4 to find r-matchings M±, . . . , M n in [n] of the form Mj = 
{Ri,i, • • • , Ri : k} with k > (5/r)n. Recall also that each of sets {v£ \ t £ Rij} spans the vector 
V{. We will use a probabilistic argument to find a set T C [n] of small size such that T will 
contain at least one set Rij for each of the z's. This will imply that dim(V) < T since, the 
vectors Vi,i G T span all of the other vectors in V. 

Consider the following random choice of V: take every element in [n] to be in T indepen- 
dently with probability fi = log n-n~r . Then, with probability higher than 3/4, we will have 
\T\ < O ■ logn^. We now show that w.h.p T will contain at least one set from each of 

the matchings. The probability for a single set Rij of size at most r to not be contained in 
T is at most 



Plugging in \i and the bound for k we get that this probability is smaller than l/20n and 
so, the probability that there exists an Mj with all sets not in T is at most 1/20 by a union 
bound. This means that there exists a choice of T of the appropriate size that contains a set 



If this bound was tight then there could be a chance to use LCCs with a constant number 
of queries in practice. However, most people believe that this bound is not tight and some 
conjecture that the polynomial constructions achieve the best possible parameters. The 
best known general upper bound was proved by Woodruff [Woo07] and gives dim(V) < 



O I n i r / 2 I which is equal to ^fn for r = 3, 4. Any improvement to either this upper bound 



or the polynomial constructions will be extremely interesting. Notice that the polynomial 
constructions we saw work only over finite (and smaller than n) characteristic. When the 
characteristic is zero (or larger than n) there are no known constructions of constant query 
LCCs with dimension tending to infinity (one can always take a code of dimension 1/5 with 
r = 1). A tempting conjecture, which might be a good starting point for progress is that 
there are no 3-query LCCs over fields of characteristic zero. 





Pr[VjG[t], i? M -?:T]<(l-^) fc . 



from each matching. This completes the proof. 



□ 
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5.4.7 LCCs as low rank sparse matrices 

A nice way to think about the LCC question is to translate it to a question on the rank of 
matrices with a certain zero/non-zero pattern. The following definition defines the particular 
pattern that arises in this setting. 

Definition 5.4.7 (LCC-matrix). Let A be an nk x n matrix over F and let A\, . . . ,A n be 

k x n matrices so that A is the concatenation of the blocks A±, . . . ,A n placed on top of each 
other (so Ag contains the rows of A numbered k{l — 1) + 1, . . . , ki). We say that A is a 
{k,r)-LCC matrix if, for each i G [n] the block Ai satisfies the following conditions: 

• Each row of Ai has support size at most r + 1 . 

• All rows in Ai are non-zero in position i. 

• The supports of two distinct rows in Ai intersect only in position i. 

The connection between LCCs and LCC-matrices will be clear from the following lemma: 

Lemma 5.4.8. Let V = (vi,...,v n ) G (¥ d ) n be a (r,5)-LCC with dim(V) = d. Then, for 
k = [5/r)n, there exists a (k,r)-LCC matrix A with n columns and with rank(M) < n — d. 
Conversely, suppose there exists a (k,r)-LCC matrix A with n columns and with rank(M) < 
n — d. Then there exists an (r,5)-LCC V = (vi, . . . ,v n ) of dimension dim(V) > d with 
5 = k/n. 

Proof For the first direction let B be the n x d matrix whose i'th row is the vector v j. We 
will construct a {k, r)-LCC matrix A such that A ■ B = 0. This will prove that the rank of 
A is at most n — d since the rank of B is dim(V) = d. Let Mi, . . . , M n be the r-Matchings 
given by Lemma 5.4.4. Each Mj can be used to define a block Ai by adding a row for each 
set Rij G Mj. We would like this row to have support {i} U Rij and to have this row in the 
(left) kernel of B. This is possible since we know that Vi is in the span of {vi \ £ G Rij- Thus, 
each block Ai will have the required properties and we are done. 

For the other direction, if rank(A) < n — d then there is a rank d matrix B of dimensions 
n x d so that A ■ B = 0. Let V = (v\, . . . , v n ) be so that v i is the ith row of B. The structure 
of Ai means that Vi belongs to the span of k disjoint sets {v£ \ (. G Rij} with Rij being the 
support of the j'th row of Ai, removing {i}. If we take any set S C [n] of size at most 
k = (k/n)n there will be some set Rij that has empty intersection with S and so V will 
satisfy the LCC property. □ 

Thus, proving upper bound on the dimension of LCCs is equivalent to proving lower 
bounds on the rank of LCC-matrices. Over fields of characteristic zero one can try using the 
results we saw on the rank of design matrices. This will work if the LCC matrix obtained 
from the code happens to be a design matrix. The only families of codes we know, those 
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based on Reed Muller codes, clearly satisfy this requirement (since every two points define a 
single line). Thus, we can use the bound on the rank of design matrices to show that there 
are no constant query LCCs over fields of characteristic zero that 'look like' Reed muller 
codes or, more generally, whose decoding r-tuples satisfy a design condition (i.e., that every 
pair of coordinates belongs to a small number of r-tuples used by the decoder). Clearly, one 
can construct artificial examples of LCCs whose decoding structure is not a design (simply 
repeat each coordinate twice). However, it is not out of the question to try and show that 
every LCC can be 'modified' in some way to give a design-based LCC with comparable query 
complexity and dimension. 

We conclude this section by mentioning a weaker type of local codes called Locally Decod- 
able Codes (LDCs). These codes only require that the local decoding will be done for some 
basis v\, . . . , Vd of the span of V. This type of decoding does not correct every symbol of the 
codeword but rather only symbols of the message. When r > 2 there are constructions of 
LDCs that significantly outperform polynomial codes. See for example the excellent survey 
[Yekll]. 
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